Fooled by Randomness

28 Sep

Permutation-based methods for calculating variable importance and interpretation are increasingly common. Here are a few common places where they are used:

Feature Importance (FI)

The algorithm for calculating permutation-based FI is as follows:

  1. Estimate a model
  2. Permute a feature
  3. Predict again
  4. Estimate decline in predictive accuracy and call the decline FI

Permutation-based FI bakes in a particular notion of FI. It is best explained with an example: Say you are calculating FI for X (1 to k) in a regression model. Say you want to estimate FI of X_k. Say X_k has a large beta. Permutation-based FI will take the large beta into account when calculating the FI. So, the notion of importance is one that is conditional on the model.

Often we want to get at a different counterfactual: If we drop X_k, what happens. You can get to that by dropping and re-estimating, letting other correlated variables get large betas. I can see a use case in checking if we can knock out say an ‘expensive’ variable. There may be other uses.

Aside: To my dismay, I kludged the two together here. In my defense, I thought it was a private email. But still, I was wrong.

Permutation-based methods are used elsewhere. For instance:

Creating Knockoffs

We construct our knockoff matrix X˜ by randomly swapping the n rows of the design matrix X. This way, the correlations between the knockoffs remain the same as the original variables but the knockoffs are not linked to the response Y. Note that this construction of the knockoffs matrix also makes the procedure random.

From https://arxiv.org/pdf/1907.03153.pdf#page=4

Local Interpretable Model-Agnostic Explanations

The recipe for training local surrogate models:

Select your instance of interest for which you want to have an explanation of its black box prediction.

Perturb your dataset and get the black box predictions for these new points.

Weight the new samples according to their proximity to the instance of interest.

Train a weighted, interpretable model on the dataset with the variations.

Explain the prediction by interpreting the local model.

From https://christophm.github.io/interpretable-ml-book/lime.html

Common Issue With Permutation Based Methods

“Another really big problem is the instability of the explanations. In an article 47 the authors showed that the explanations of two very close points varied greatly in a simulated setting. Also, in my experience, if you repeat the sampling process, then the explantions that come out can be different. Instability means that it is difficult to trust the explanations, and you should be very critical.”

From https://christophm.github.io/interpretable-ml-book/lime.html

Solution

One way to solve instability is to average over multiple rounds of permutations. It is expensive but the payoff is stability.