【原】ML：《Interpretability Methods in Machine Learning: A Brief Survey—機器學習中的可解釋性方法的簡要綜述》翻譯與解讀

處女座的程序猿 2022-07-21 發(fā)布于上海

展開全文

ML：《Interpretability Methods in Machine Learning: A Brief Survey—機器學習中的可解釋性方法的簡要綜述》翻譯與解讀

《Interpretability Methods in Machine Learning: A Brief Survey》翻譯與解讀

作者	AI engineer Xiang Zhou，Two Sigma
原文地址	Interpretability Methods in Machine Learning: A Brief Survey - Two Sigma
時間

Two Sigma AI engineer Xiang Zhou outlines several approaches for understanding how machine learning models arrive at the answers they do.

兩位 Sigma AI 工程師周翔概述了幾種理解機器學習模型如何得出答案的方法。

Machine learning (ML) models can be astonishingly good at making predictions, but they often can’t yield ?explanations for their forecasts in terms that humans can easily understand. The features from which they draw conclusions can be so numerous, and their calculations so complex, that researchers can find it impossible to establish exactly why an algorithm produces the answers it does.

It is possible, however, to determine how a machine learning algorithm arrived at its conclusions.

This ability, otherwise known as “interpretability,” is a very active area of investigation among AI researchers in both academia and industry. It differs slightly from “explainability”–answering why–in that it can reveal causes and effects of changes within a model, even if the model’s internal workings remain opaque.

機器學習(ML)模型在做出預測方面的能力驚人，但它們通常不能以人類容易理解的方式對其預測做出解釋。他們得出結論的特征可能如此之多，計算如此復雜，以至于研究人員無法準確確定算法產生其答案的確切原因。

但是，去確定，機器學習算法是如何得出結論，這是可以實現的。

這種能力，也稱為“可解釋性”，這也是學術界和工業(yè)界人工智能研究人員非常活躍的研究領域。它與“可解釋性”（回答為什么）略有不同，因為它可以揭示模型內變化的原因和影響，即使模型的內部運作仍然不透明。

Interpretability is crucial for several reasons. If researchers don’t understand how a model works, they can have difficulty transferring learnings into a broader knowledge base, for example. Similarly, interpretability is essential for guarding against embedded bias or debugging an algorithm. It also helps researchers measure the effects of trade-offs in a model. More broadly, as algorithms play an increasingly important role in society, understanding precisely how they come up with their answers will only become more critical.

Researchers currently must compensate for incomplete interpretability with judgement, experience, observation, monitoring, and diligent risk management–including a thorough understanding of the datasets they use. However, several techniques exist for enhancing the degree of interpretability in machine learning models, regardless of their type. This article summarizes several of the most common of these, including their relative advantages and disadvantages.

基于以下幾個原因，可解釋性是至關重要的。例如，如果研究人員不了解模型的工作原理，他們可能難以將學習成果轉化為更廣泛的知識庫。同樣，可解釋性對于防止嵌入偏差或調試算法至關重要。它還可以幫助研究人員衡量模型中trade-offs的影響。更廣泛地說，隨著算法在社會中扮演著越來越重要的角色，準確理解它們是如何得出答案的只會變得更加關鍵。

目前，研究人員必須通過判斷、經驗、觀察、監(jiān)控和勤勉的風險管理來彌補不完整的可解釋性——包括對他們使用的數據集的徹底理解。然而，有幾種技術可以提高機器學習模型的可解釋性，不管它們是什么類型的。本文總結了其中最常見的幾種，包括它們的相對優(yōu)點和缺點。

Interpretable ML models and “Black Boxes”

Some machine learning models are interpretable by themselves. For example, for a linear model, the predicted outcome Y is a weighted sum of its features X. You can visualize “y equals a X plus b” in a plot as a straight line: a, the feature weight, is the slope of the line, and b is the intercept of the y-axis.

有一些機器學習模型可以自行解釋。例如，對于線性模型，預測結果 Y 是其特征 X 的加權和。您可以將“Y = a X + b”在圖中可視化為一條直線:a，特征權重，是直線的斜率，b是Y軸的截距。

Linear models are user-friendly because they are simple and easy to understand. However, achieving the highest accuracy for large modern datasets often requires more complex and expressive models, like neural networks.

The following image shows a small, fully connected neural network, one of the simplest neural architectures. But even for this simplest neural architecture, there’s no way for anyone to understand which neuron is playing what role, and which input feature actually contributes to the model output. For this reason, such models are sometimes called “black boxes.”

Now imagine a model with millions of neurons, and all sorts of connections. Without robust interpretability techniques, it would be difficult for a researcher to understand it at all.

線性模型是用戶友好的，因為它們簡單易懂。然而，為大型現代數據集實現最高精度通常需要更復雜和更具表現力的模型，例如神經網絡。

下圖展示了一個小型的全連接神經網絡，它是最簡單的神經結構之一。但即使對于這種最簡單的神經結構，也沒有人能理解哪個神經元在扮演什么角色，哪個輸入特征實際上對模型輸出有貢獻。因此，此類模型有時被稱為“黑匣子”。

現在想象一個具有數百萬個神經元和各種連接的模型。如果沒有強大的可解釋性技術，研究人員將很難理解它。

Model-agnostic interpretability methods

Several important model-agnostic interpretability methods exist, and while none of them are perfect, they can help researchers interpret the results of even very complex ML models.

For demonstration purposes, let’s consider a small time-series dataset. A time series is simply a series of data points indexed in time order. It is the most common type of data in the financial industry. A frequent goal of quantitative research is to identify trends, seasonal variations, and correlation in financial time series data using statistical and machine learning methods.

存在幾種重要的與模型無關的可解釋性方法，雖然它們都不是完美的，但它們可以幫助研究人員解釋甚至非常復雜的 ML 模型的結果。

出于演示目的，讓我們考慮一個小型時間序列數據集。時間序列只是按時間順序索引的一系列數據點。它是金融行業(yè)中最常見的數據類型。定量研究的一個常見目標是使用統(tǒng)計和機器學習方法識別金融時間序列數據中的趨勢、季節(jié)變化和相關性。

Problem

Data:

Time series data (X, y)

Model:

model = RandomForestRegressor (n_estimators=10, max_depth=3)

model.fit (X, y)

Prediction:

? = model.predict (X)

The model used in this example is a RandomForestRegressor from sklearn.

時間序列數據(X, y)

Method 1: Partial Dependence Plot (PDP)

The first method we’ll examine is Partial Dependence Plot or PDP, which was invented decades ago, and shows the marginal effect that one or two features have on the predicted outcome of a machine learning model.

It helps researchers determine what happens to model predictions as various features are adjusted.

我們要研究的第一種方法是幾十年前發(fā)明的局部依賴圖或 PDP，它顯示了一個或兩個特征對機器學習模型的預測結果的邊際效應。

它幫助研究人員確定隨著各種特征的調整，模型預測會發(fā)生什么。

Here in this plot, the x-axis represents the value of feature f0, and the y-axis represents the predicted value. The solid line in the shaded area shows how the average prediction varies as the value of f0 changes.

PDP is very intuitive and easy to implement, but because it only shows the average marginal effects, heterogeneous effects might be hidden.1 For example, one feature might show a positive relationship with prediction for half of the data, but a negative relationship for the other half. The plot of the PDP will simply be a horizontal line.

To solve this problem, a new method was developed.

在此圖中，x 軸表示特征 f0 的值，y 軸表示預測值。 陰影區(qū)域中的實線顯示了平均預測如何隨著 f0 值的變化而變化。

PDP 非常直觀且易于實現，但由于它只顯示平均邊際效應，導致異構效應可能被隱藏。例如，一個特征可能與一半數據的預測呈現正相關關系，但與另一半數據的預測呈現負相關關系。 PDP 的圖將只是一條水平線。

為了解決這個問題，開發(fā)了一種新方法。

Method 2: Individual Conditional Expectation (ICE)

Individual Conditional Expectation or ICE, is very similar to PDP, but instead of plotting an average, ICE displays one line per instance.

This method is more intuitive than PDP because each line represents the predictions for one instance if one varies the feature of interest.

Like partial dependence, ICE helps explain what happens to the predictions of the model as a particular feature varies.

ICE displays one line per instance:

Individual Conditional Expectation單個條件期望，即ICE，它與 PDP 非常相似，但是?ICE不是繪制平均值，而是每個實例顯示一條線。

這種方法比 PDP 更直觀，因為如果一個實例改變了感興趣的特征，每一行代表一個實例的預測。

與局部依賴一樣，ICE 有助于解釋隨著特定特征的變化，模型的預測會發(fā)生什么變化。

ICE 每個實例顯示一行：

Unlike PDP, ICE curves can uncover heterogeneous relationships. However, this benefit also comes with a cost: it might not be as easy to see the average effect as it is with PDP.

與 PDP 不同，ICE曲線可以揭示異構關系。然而，這種好處也伴隨著成本：它可能不像使用 PDP 那樣容易看到平均效果。

Method 3: Permuted Feature Importance

Permuted Feature Importance is another traditional interpretability method.

The importance of a feature is the increase in the model prediction error after the feature’s values are shuffled. In other words, it helps define how the features in a model contribute to the predictions it makes.

In the plot below, the x-axis represents the score reduction, or model error, and the y-axis represent each feature f0, f1, f2, f3.

置換特征重要性是另一種傳統(tǒng)的可解釋性方法。

特征的重要性在于對特征值進行混洗后模型預測誤差的增加。換句話說，它有助于定義模型中的特征如何對其預測所做出的貢獻。

在下圖中，x 軸表示分數降低，或模型誤差，y 軸表示每個特征 f0、f1、f2、f3。

As the plot shows, feature f2, the feature on top, has the largest impact on the model error; while f1, the second feature from the top, has no impact on the error after the shuffling. The remaining two features have negative contributions to the model.

如圖所示，最上面的特征 f2 對模型誤差的影響最大；而從上數第二個特征 f1 對洗牌后的誤差沒有影響。其余兩個特征對模型有負面貢獻。

PDP vs. ICE vs. Feature Importance

All three of the methods above are intuitive and easy to implement.

PDP shows global effects, while hiding heterogeneous effects. ICE can uncover heterogeneous effects, but makes it hard to see the average.

Feature importance provides a concise way to understand the model’s behavior. The use of error ratio (instead of the error) makes the measurements comparable across different problems. And it automatically takes into account all interactions with other features.

上述所有三種方法都直觀且易于實現。

PDP 顯示全局效應，同時隱藏異構效應。?ICE?可以發(fā)現異構效應，但很難看到平均值。

特征重要性提供了一種簡潔的方式來理解模型的行為。使用誤差率（而不是誤差）使得不同問題之間的測量具有可比性。它會自動考慮與其他特征的所有交互。

However, the interactions are not additive. Adding up feature importance does not result in a total drop in performance. Shuffling the features adds randomness, so the results may be different each time. Also, the shuffling requires access to true outcomes, which is impossible for many scenarios.

Besides, all three methods assume the independence of the features, so if features are correlated, unlikely data points will be created and the interpretation can be biased by these unrealistic data points.

然而，相互作用不是疊加的。將特征重要性疊加不會導致性能完全下降。對特征進行洗牌會增加隨機性，因此每次的結果都可能不同。此外，洗牌需要獲得真實的結果，這在許多情況下是不可能的。

此外，這三種方法都假設特征的獨立性，因此如果特征相關，就會產生不太可能的數據點，并且這些不切實際的數據點可能會導致解釋產生偏差。

Method 4: Global Surrogate

The global surrogate method takes a different approach. In this case, an interpretable model is trained to approximate the prediction of a black box model.

The process is simple. First you get predictions on a dataset with the trained black box model, and then train an interpretable model on this dataset and predictions. The trained interpretable model now becomes a surrogate of the original model, and all we need to do is to interpret the surrogate model. Note, the surrogate model could be any interpretable model: linear model, decision tree, human defined rules, etc.

全局代理方法采用不同的方法。在這種情況下，訓練一個可解釋的模型來近似黑盒模型的預測。

這個過程很簡單。首先，使用經過訓練的黑盒模型對數據集進行預測，然后在該數據集和預測上訓練可解釋的模型。訓練好的可解釋模型現在成為原始模型的代理，我們需要做的就是解釋代理模型。請注意，代理模型可以是任何可解釋的模型：線性模型、決策樹、人為定義的規(guī)則等。

Using an interpretable model to approximate the black box model introduces additional error, but the additional error can easily be measured by R-squared.

However, since the surrogate models are only trained on the predictions of the black box model instead of the real outcome, global surrogate models can only interpret the black box model, but not the data.

使用可解釋的模型來近似黑盒模型會引入額外的誤差，但額外的誤差可以很容易地通過R2來測量。

但是，由于代理模型僅根據黑盒模型的預測而不是真實結果進行訓練，因此全局代理模型只能解釋黑盒模型，而不能解釋數據。

Method 5: Local Surrogate (LIME)

Local Surrogate, or LIME (for Local Interpretable Model-agnostic Explanations), is different from global surrogate, in that it does not try to explain the whole model. Instead, it trains interpretable models to approximate the individual predictions.

LIME tries to understand how the predictions change when we perturb the data samples. Here is an example of LIME explaining why this picture is classified as a tree frog by the model.

局部代理，或 LIME（用于局部可解釋、與模型無關的解釋）與全局代理不同，因為它不試圖解釋整個模型。相反，它訓練可解釋的模型來近似單個預測。

LIME 試圖了解當我們打亂數據樣本時預測如何變化。下面是一個 LIME 示例，解釋了為什么這張圖片被模型歸類為樹蛙。

First the image on the left is divided into interpretable components. LIME then generates a dataset of perturbed instances by turning some of the interpretable components “off” (in this case, making them gray).

For each perturbed instance, one can use the trained model to get the probability that a tree frog is in the image, and then learn a locally weighted linear model on this dataset.

In the end, the components with the highest positive weights are presented as an explanation.

首先，左側的圖像被劃分為可解釋的組件。然后，LIME通過“關閉”一些可解釋的組件（在這種情況下，將它們設為灰色）來生成一個受干擾實例的數據集。

對于每一個受擾動的實例，可以使用訓練好的模型來獲得樹蛙出現在圖像中的概率，然后在這個數據集上學習一個局部加權的線性模型。

最后，給出具有最高正權重的組件作為解釋。

Global vs. Local Surrogate Methods

Both the global and local surrogate methods have advantages and disadvantages.

Global surrogate cares about explaining the whole logic of the model, while local surrogate is only interested in understanding specific predictions.

With the global surrogate method, any interpretable model can be used as surrogate, and the closeness of the surrogate models to the black box models can easily be measured.

However, since the surrogate models are trained only on the predictions of the black box model instead of the real outcome, they can only interpret the model, and not the data. Besides, the surrogate models, which are simpler than the black box model in a lot of cases, may only be able to give good explanations to part of the data, instead of the entire dataset.

全局和局部代理方法各有優(yōu)缺點。

全局代理關心的是解釋模型的整個邏輯，而局部代理只關心理解特定的預測。

使用全局代理方法，可以使用任何可解釋的模型作為代理，并且可以很容易地度量代理模型與黑箱模型的接近程度。

但是，由于代理模型僅根據黑盒模型的預測而不是真實結果進行訓練，因此它們只能解釋模型，而不能解釋數據。此外，代理模型在很多情況下比黑箱模型更簡單，只能很好地解釋部分數據，而不是整個數據集。

The local surrogate method, on the other hand, does not share these shortcomings. In addition, the local surrogate method is model-agnostic: If you need to try a different black box model for your problem, you can still use the same surrogate models for interpretations. And compared with interpretations given by global surrogate methods, the interpretations from local surrogate methods are often short, contrastive, and human-friendly.

However, local surrogate has its own issues.

First, LIME uses a kernel to define the area within which data points are considered for local explanations, but it is difficult to find the proper kernel setting for a task. The way sampling is done in LIME can lead to unrealistic data points, and the local interpretation can be biased towards those data points.

Another concern is the instability of the explanations. Two very close points could lead to two very different explanations.

另一方面，局部代理方法沒有這些缺點。此外，局部代理方法與模型無關：如果您需要為您的問題嘗試不同的黑盒模型，您仍然可以使用相同的代理模型進行解釋。并且與全局代理方法給出的解釋相比，局部代理方法的解釋往往簡短、對比鮮明、人性化。

但是，局部代理有其自身的問題。

首先，LIME使用一個內核來定義數據點被考慮到局部解釋的區(qū)域，但是很難找到適合任務的內核設置。在 LIME 中進行抽樣的方式可能會導致不切實際的數據點，并且局部解釋可能會偏向這些數據點。

另一個問題是解釋的不穩(wěn)定性。兩個非常接近的點可能導致兩種截然不同的解釋。