import React from "react";
import { SimpleContentPage } from "../../components";
import { Tabs, Tab } from "carbon-components-react";
import { Link } from "react-router-dom";
import Term from "../Term/Term";

export const Guidance = () => (
  <SimpleContentPage
    key="Guidance"
    title="Guidance on Choosing UQ Algorithms and Metrics"
  >
    <Tabs type="container" light>
      <Tab label="Algorithms">
        <p>
          Uncertainty quantification (UQ) algorithms can be broadly classified
          as intrinsic or extrinsic depending on how the uncertainties are
          obtained from the AI models. The following diagram shows the taxonomy
          of the UQ algorithms, most of them included UQ360. The text below
          provides further explanations.
        </p>
        <img
          src="/imgs/taxonomy.png"
          width="750px"
          alt="Taxonomy of uncertainty estimation algorithms"
        />
        <p>
          You can choose the right UQ algorithm based on your answers to the
          following questions:
        </p>
        <h2>
          Intrinsic (training a new model) or extrinsic (working with a trained
          model) UQ methods?
        </h2>
        <p>
          First and foremost, the choice of UQ algorithm depends on whether you
          intend to train a new model or already have a trained model. In the
          former case, intrinsic methods can be used to train a model that
          provides uncertainty estimates. If you already trained a model,
          extrinsic methods can be used to either improve the quality of your
          model's existing uncertainty estimates, or generate post-hoc
          uncertainty estimates if your model does not provide them.
        </p>
        <h3>If intrinsic, do you care about model uncertainty?</h3>
        <p>
          As discussed in the <Link to="/overview">Overview</Link>, the two main
          sources of <Term>predictive uncertainty</Term> are{" "}
          <Term>data uncertainty</Term> and <Term>model uncertainty</Term>. Some
          intrinsic UQ algorithms can be used to train ML models that capture
          both with the predictions. However, they can be computationally
          expensive and sometimes unable to generate high-quality uncertainty
          estimates (more on{" "}
          <a
            href="https://uq360.readthedocs.io/en/latest/metrics.html"
            target="_blank"
            rel="noopener noreferrer"
          >
            evaluation metrics
          </a>{" "}
          by navigating to the other tab above). In these cases, you may opt for
          an intrinsic UQ algorithm that captures only data uncertainty. The
          output of any intrinsic UQ algorithms can be further improved if
          necessary using re-calibration methods (more on that later).
        </p>
        <h4>Intrinsic methods that capture both data and model uncertainty</h4>
        <p>
          The toolkit includes two classes of algorithms to train a model that
          captures both data and model uncertainty: Bayesian approaches and
          ensemble/resampling approaches. In general, Bayesian approaches are
          computationally more expensive to train, but they are more strongly
          grounded in theory.
        </p>
        <p>
          Bayesian approaches included in UQ360 are{" "}
          <a
            href="https://uq360.readthedocs.io/en/latest/intrinsic.html#bayesian-neural-network-regression"
            target="_blank"
            rel="noopener noreferrer"
          >
            Bayesian neural networks (BNNs)
          </a>{" "}
          and{" "}
          <a
            href="https://uq360.readthedocs.io/en/latest/intrinsic.html#homoscedastic-gaussian-process-regression"
            target="_blank"
            rel="noopener noreferrer"
          >
            Gaussian processes
          </a>
          . BNNs, instead of finding a single set of optimal model parameters,
          generate a posterior distribution over model parameters given a prior
          distribution and the training data. However, under model
          mis-specification, BNNs can result in low-quality uncertainty
          estimates. UQ360 also includes{" "}
          <a
            href="https://uq360.readthedocs.io/en/latest/intrinsic.html#bayesian-neural-network-regression"
            target="_blank"
            rel="noopener noreferrer"
          >
            HS-BNNs
          </a>{" "}
          with sparsity promoting Horseshoe priors that can lead to better
          quality uncertainties, especially when working with relatively small
          datasets.
        </p>
        <p>
          Ensembles of models enable attractive approaches to estimate model
          uncertainty, even if individual models in the ensemble do not capture
          uncertainties. The variability in predictions among the ensemble
          members can be viewed as the predictive uncertainty. While promising,
          standard methods for creating ensembles require repeated model
          fits—one fit for each member, a prohibitively expensive process for
          large datasets. UQ360 includes{" "}
          <a
            href="https://uq360.readthedocs.io/en/latest/extrinsic.html#infinitesimal-jackknife"
            target="_blank"
            rel="noopener noreferrer"
          >
            infinitesimal jackknife (IJ)
          </a>{" "}
          which allows the construction of ensembles from a single model fit.
        </p>
        <h4>Intrinsic methods that capture only data uncertainty</h4>
        <p>
          For classification, standard probabilistic classification models such
          as neural networks trained with cross entropy loss can capture data
          uncertainty. For regression,{" "}
          <a
            href="https://uq360.readthedocs.io/en/latest/intrinsic.html#homoscedastic-gaussian-process-regression"
            target="_blank"
            rel="noopener noreferrer"
          >
            homoscedastic regression
          </a>
          ,{" "}
          <a
            href="https://uq360.readthedocs.io/en/latest/intrinsic.html#heteroscedastic-regression"
            target="_blank"
            rel="noopener noreferrer"
          >
            heteroscedastic regression
          </a>{" "}
          and{" "}
          <a
            href="https://uq360.readthedocs.io/en/latest/intrinsic.html#quantile-regression"
            target="_blank"
            rel="noopener noreferrer"
          >
            quantile regression
          </a>{" "}
          models can capture data uncertainty. However, they differ in their
          underlying assumptions of the noise model for the predictive
          distribution. Homoscedastic regression assumes the noise to be
          constant across features whereas heteroscedastic allows the noise to
          vary with different features.{" "}
          <a
            href="https://uq360.readthedocs.io/en/latest/intrinsic.html#quantile-regression"
            target="_blank"
            rel="noopener noreferrer"
          >
            Quantile regression
          </a>{" "}
          is a non-parametric method that directly predicts the uncertainty
          quantiles of the predicted outcomes without assuming a noise
          distribution.
        </p>
        <h3>If extrinsic, does the trained model provide UQ or not?</h3>
        <p>
          If you are working with a trained model, first you should determine
          whether the model can provide uncertainty estimates directly. For
          example, commonly used probabilistic classification models such as
          neural networks can capture data uncertainty, while SVMs, decision
          trees, and k-nearest neighbors usually cannot directly capture
          uncertainties in their predictions. If your model provides uncertainty
          estimates, you can evaluate its quality using the{" "}
          <a
            href="https://uq360.readthedocs.io/en/latest/metrics.html"
            target="_blank"
            rel="noopener noreferrer"
          >
            UQ metrics
          </a>
          . If you are not satisfied by the quality, UQ360 provides a class of
          extrinsic algorithms to improve these estimates. If your model does
          not provide uncertainty estimates directly, UQ360 includes a class of
          post-hoc approaches to generate uncertainty these estimates.
        </p>
        <h4>Extrinsic methods to improve UQ quality</h4>
        <p>
          UQ360 provides a set of algorithms to improve the quality,
          specifically calibration of existing uncertainty estimates. In short,
          mis-calibrated uncertainty estimates mean that the observed outcome
          distribution tested with a held-out validation dataset misaligns with
          the given uncertainty estimates, and re-calibration involves
          correcting the estimates to match the observed distribution. For
          classification tasks, you may use{" "}
          <a
            href="https://uq360.readthedocs.io/en/latest/extrinsic.html#classification-calibration"
            target="_blank"
            rel="noopener noreferrer"
          >
            Isotonic Regression and Platt-scaling
          </a>
          . For regression tasks,{" "}
          <a
            href="https://uq360.readthedocs.io/en/latest/extrinsic.html#auxiliary-interval-predictor"
            target="_blank"
            rel="noopener noreferrer"
          >
            Auxiliary Interval Predictors
          </a>{" "}
          and{" "}
          <a
            href="https://uq360.readthedocs.io/en/latest/extrinsic.html#ucc-recalibration"
            target="_blank"
            rel="noopener noreferrer"
          >
            UCC Recalibration
          </a>{" "}
          can be employed. Re-calibration methods can also be used to improve
          the uncertainty estimates of the intrinsic UQ algorithms.
        </p>
        <h4>Extrinsic methods to generate post-hoc UQ</h4>
        <p>
          For existing models that cannot generate uncertainty estimates
          directly, UQ360 provides{" "}
          <a
            href="https://uq360.readthedocs.io/en/latest/extrinsic.html#blackbox-metamodel-classification"
            target="_blank"
            rel="noopener noreferrer"
          >
            Meta-Models (MMs)
          </a>{" "}
          that can obtain these estimates in a post-hoc manner. In the case of
          regression, an MM augments the base-model to obtain a prediction
          interval, whereas in the case of classification, an MM return a scalar
          value that indicates the confidence in the prediction of the
          base-model. The{" "}
          <a
            href="https://uq360.readthedocs.io/en/latest/extrinsic.html#infinitesimal-jackknife"
            target="_blank"
            rel="noopener noreferrer"
          >
            IJ algorithm
          </a>{" "}
          discussed above can also be used to generate post-hoc UQ by
          approximating the effect of training data perturbations on the model’s
          predictions.
        </p>
      </Tab>
      <Tab label="Metrics">
        <p>
          A good UQ method should generate uncertainty estimates that work
          reliably. This means that for regression, the actual outcome should
          fall within the prediction interval and for classification, it should
          align with the predicted class with a probability close to the stated
          confidence. If so, we say the uncertainty estimates are
          well-calibrated. Since the ground truth uncertainties are unknown in
          practice, <strong>calibration metrics</strong> that assess this
          estimation quality are typically evaluated with a set of held-out test
          data.
        </p>
        <p>
          Users also desire the model to be as confident as possible. A
          well-calibrated but large prediction interval (for regression) or
          low-confidence prediction (for classification) may be useless in
          practice. When assessing the quality of UQ estimation, it is often
          necessary to consider both calibration and level of confidence.
          Sometimes there is a trade-off between the two.
        </p>
        <p>
          For classification, UQ360 provides functions to calculate calibration
          metrics including <Term>Expected Calibration Error (ECE)</Term> and{" "}
          Brier Score. For a well-calibrated classification model, we expect
          that for cases in which the model predicts with a given confidence
          score <i>p</i> (between 0 and 1), the model should be correct with a
          probability close to <i>p</i>. ECE metric captures the discrepancy
          between model confidence scores and it accuracy across various model
          confidence bins. Brier score, defined as the mean square difference
          between the predicted probabilities and actual outcomes, captures the
          trade-off between calibration errors and level of uncertainty in
          predictions. The{" "}
          <a
            href="https://uq360.readthedocs.io/en/latest/classification_metrics.html#uq360.metrics.classification_metrics.plot_risk_vs_rejection_rate"
            target="_blank"
            rel="noopener noreferrer"
          >
            risk vs rejection rate curve
          </a>
          , provided in UQ360, can be used to analyze the tradeoff between
          rejection rate (i.e. rejecting predictions above a given uncertainty
          threshold) and improvement in various risk metrics. Some examples of
          risk metrics include fairness metrics such as statistical parity
          difference and equalized odds error (you can learn more in{" "}
          <a
            href="https://aif360.res.ibm.com/resources#guidance"
            target="_blank"
            rel="noopener noreferrer"
          >
            AIF360
          </a>
          ).
        </p>
        <p>
          For regression, UQ360 provides functions to estimate a calibration
          metric called{" "}
          <Term>Prediction Interval Coverage Probability (PICP)</Term>, defined
          as the fraction of the test data whose outcomes are covered by the
          prediction intervals. UQ360 also provides functions to calculate a
          metric to measure the overall uncertainty of a regression model called{" "}
          <Term>Mean Prediction Interval Width (MPIW)</Term>, defined as the
          average width of prediction interval across the examples. Note that
          PICP can be improved with re-calibration methods, which could widen
          the MPIW (see{" "}
          <a
            href="https://uq360.readthedocs.io/en/latest/extrinsic.html"
            target="_blank"
            rel="noopener noreferrer"
          >
            Extrinsic methods to improve UQ quality
          </a>
          ). The combination of PICP and MPIW defines an operating point for any
          algorithm. To allow comparison of trade-off between PICP and MPIW
          among versions of uncertainty estimates, this toolkit provides a novel
          operating point-agnostic tool—the{" "}
          <a
            href="https://uq360.readthedocs.io/en/latest/regression_metrics.html#uncertainty-characteristics-curve"
            target="_blank"
            rel="noopener noreferrer"
          >
            Uncertainty Characteristic Curve (UCC)
          </a>
          —to visualize and assess this trade-off using the Area under the
          Uncertainty Characteristic Curve (AUUCC) and AUUCC-gain metrics.
        </p>
      </Tab>
    </Tabs>
  </SimpleContentPage>
);
