如何用SHAP解读集成学习Stacking中的基学习器和元学习器以及整体模型贡献

背景

Stacking（堆叠集成）是一种集成学习方法，它通过组合多个基础模型（一级学习器）的预测结果，再用一个元模型（二级学习器）来进一步学习这些预测结果，最终得到一个更强的预测模型，使用多个不同的基础模型（如随机森林、XGBoost、LightGBM等）对训练数据进行预测，将这些基础模型的预测值作为新的特征，输入到元模型中，元模型通过学习基础模型的输出特征，综合各模型的优点，给出最终预测.

在下面的实际代码实现中，第一层使用了多种基学习器，包括随机森林、XGBoost、LightGBM、梯度提升、AdaBoost和CatBoost，这些模型分别独立训练并生成预测结果，第二层的元学习器采用线性回归，通过学习第一层各基学习器的预测结果，进一步整合优化，生成最终的预测结果

SHAP如何解释Stacking模型？

需要注意的是，SHAP是一种对单一模型进行解释的工具，它通过分配特征对模型预测的贡献值来衡量特征的重要性，所以针对Stacking需要逐层拆解进行分析，可以通过以下两种方式来解释Stacking模型：

逐步拆解Stacking结构，分别解释基学习器和元学习器的行为
将Stacking模型视为整体的“黑箱”进行解释（仅关注输入特征与最终预测输出的关系）

代码实现

模型构建

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt 

import warnings

warnings.filterwarnings("ignore")



plt.rcParams['font.family'] = 'Times New Roman'

plt.rcParams['axes.unicode_minus'] = False

df = pd.read_excel('2024-12-7公众号Python机器学习AI.xlsx')



from sklearn.model_selection import train_test_split, KFold



X = df.drop(['Y'],axis=1)

y = df['Y']



# 划分训练集和测试集

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, 

                                                    random_state=42)



from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor, AdaBoostRegressor, StackingRegressor

from xgboost import XGBRegressor

from lightgbm import LGBMRegressor

from catboost import CatBoostRegressor

from sklearn.linear_model import LinearRegression



# 定义一级学习器

base_learners = [

    ("RF", RandomForestRegressor(n_estimators=100, random_state=42)),

    ("XGB", XGBRegressor(n_estimators=100, random_state=42, verbosity=0)),

    ("LGBM", LGBMRegressor(n_estimators=100, random_state=42, verbose=-1)),

    ("GBM", GradientBoostingRegressor(n_estimators=100, random_state=42)),

    ("AdaBoost", AdaBoostRegressor(n_estimators=100, random_state=42)),

    ("CatBoost", CatBoostRegressor(n_estimators=100, random_state=42, verbose=0))  

]



# 定义二级学习器

meta_model = LinearRegression()



# 创建Stacking回归器

stacking_regressor = StackingRegressor(estimators=base_learners, final_estimator=meta_model, cv=5)



# 训练模型

stacking_regressor.fit(X_train, y_train)

通过训练多个基学习器（如随机森林、XGBoost等）和一个线性回归作为元学习器，构建并训练了一个用于回归任务的Stacking集成模型。

基学习器SHAP值计算

针对RF单一模型解释

shap_dfs['RF']

计算SHAP值，逐一解析Stacking模型中每个基学习器的特征重要性并保存为数据框，便于后续分析，这里只展示了随机森林的shap值结果。

RF模型蜂巢图

plt.figure()

shap.summary_plot(np.array(shap_dfs['RF']), X_test, feature_names=X_test.columns, plot_type="dot", show=False)

plt.savefig("RF summary_plot.pdf", format='pdf',bbox_inches='tight')

RF模型shap特征贡献图

plt.figure(figsize=(10, 5), dpi=1200)

shap.summary_plot(np.array(shap_dfs['RF']), X_test, plot_type="bar", show=False)

plt.title('SHAP_numpy Sorted Feature Importance')

plt.tight_layout()

plt.savefig("RF Sorted Feature Importance.pdf", format='pdf',bbox_inches='tight')

plt.show()

绘制基学习器里随机森林的SHAP蜂巢图和特征贡献排序图，其他基学习器也可用类似方法进行特征重要性分析

绘制完整基学习器蜂巢图

为Stacking模型中的所有基学习器绘制SHAP特征重要性蜂巢图，可以发现每个基学习器的 SHAP 解释并不相同，正是因为每个基学习器独立工作并对特征有不同的偏好所导致的

绘制完整基学习器shap特征贡献图

为Stacking模型中的所有基学习器绘制SHAP特征贡献排序图（柱状图），展示每个基学习器特征重要性的平均影响，其它SHAP可视化同样的道理绘制。

元学习器SHAP值计算

shap_df

计算元学习器的SHAP值，其中输入特征是各基学习器的预测结果，模型仅对这些特征进行解释以揭示基学习器对最终预测的贡献。

元学习器蜂巢图

plt.figure()

shap.summary_plot(np.array(shap_df), shap_df, feature_names=shap_df.columns, plot_type="dot", show=False)

plt.title("SHAP Contribution Analysis for the Meta-Learner in the Second Layer of Stacking Regressor", fontsize=16, y=1.02)

plt.savefig("SHAP Contribution Analysis for the Meta-Learner in the Second Layer of Stacking Regressor.pdf", format='pdf', bbox_inches='tight')

plt.show()

元学习器hap特征贡献图

plt.figure(figsize=(10, 5), dpi=1200)

shap.summary_plot(np.array(shap_df), shap_df, plot_type="bar", show=False)

plt.tight_layout()

plt.title("Bar Plot of SHAP Feature Contributions for the Meta-Learner in Stacking Regressor", fontsize=16, y=1.02)

plt.savefig("Bar Plot of SHAP Feature Contributions for the Meta-Learner in Stacking Regressor.pdf", format='pdf', bbox_inches='tight')

plt.show()

绘制元学习器（第二层 LinearRegression）的SHAP蜂巢图和特征贡献排序图，分别展示各基学习器的预测值对元学习器最终决策的影响分布和平均重要性。这些可视化结果揭示了在Stacking 第二层中，不同基学习器对元学习器预测的贡献程度，从而帮助了解每个基学习器在整体模型中的相对重要性

元学习器蜂巢图与特征关系图结合展示

组合shap可视化蜂巢图和特征贡献图，让复杂的机器学习模型变得更加透明和易于解释。

Stacking模型视为整体的“黑箱”解释

Stacking模型整体shap计算

stacking_shap_df

对Stacking模型进行整体解释，计算输入特征对模型预测输出的贡献，仅关注输入特征与最终预测输出的关系，当然这里作者只计算了测试集里前100个样本的shap值，由于模型本身的复杂性同时计算所有样本shap值对于时间成本有一定要求

Stacking模型整体蜂巢图

plt.figure()

shap.summary_plot(np.array(stacking_shap_df), stacking_shap_df, feature_names=stacking_shap_df.columns, plot_type="dot", show=False)

plt.title("Based on the overall feature contribution analysis of SHAP to the stacking model", fontsize=16, y=1.02)

plt.savefig("Based on the overall feature contribution analysis of SHAP to the stacking model.pdf", format='pdf', bbox_inches='tight')

plt.show()

Stacking模型整体特征贡献图

plt.figure(figsize=(10, 5), dpi=1200)

shap.summary_plot(np.array(stacking_shap_df), shap_df, plot_type="bar", show=False)

plt.tight_layout()

plt.title("SHAP-based Stacking Model Feature Contribution Histogram Analysis", fontsize=16, y=1.02)

plt.savefig("SHAP-based Stacking Model Feature Contribution Histogram Analysis.pdf", format='pdf', bbox_inches='tight')

plt.show()

使用SHAP分析将Stacking模型视为整体的“黑箱”，可视化输入特征对最终预测结果的整体贡献关系生成蜂巢图、特征贡献图，同样可以组合shap可视化蜂巢图和特征贡献图，让复杂的机器学习模型变得更加透明和易于解释。

文章转自微信公众号@Python机器学习AI