Reviewing perturbation explainability
Perturbation-based methods investigate neural network properties by perturbing the training input, either partially occluding pixels in an image or substituting words in textual data to observe how they influence a model’s prediction. Domain experts and users can evaluate the quality of explanations by analyzing saliency representations based on natural intuition to correlate features that stand out in the images.
Measuring the level of changes in output based on the presence or absence of a feature indicates its importance to the overall model prediction. This section explores a perturbation-based explainability example using local interpretable model-agnostic explanations (LIME).
LIME
We discussed LIME as a local approximation post hoc explainability framework in Chapter 6. In this section, let’s explore how LIME provides local explainability using interpretable representations through perturbations. LIME defines interpretable...