import dalex as dx
import numpy as np
import pandas as pd
from lightgbm import LGBMRegressor
from sklearn.svm import SVR
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
import warnings
warnings.filterwarnings('ignore')
dx.__version__
Transform the skewed target variable (y
) for better model fit.
data = dx.datasets.load_fifa()
X = data.drop(["nationality", "overall", "potential", "value_eur", "wage_eur"], axis = 1)
y = data['value_eur']
ylog = np.log(y)
Use Pipeline
to scale the data.
model_svm = Pipeline(steps=[('scale', StandardScaler()),
('model', SVR(C=10, epsilon=0.2, tol=1e-4))])
model_svm.fit(X, ylog)
model_gbm = LGBMRegressor(n_estimators=200, max_depth=10, learning_rate=0.15, random_state=0)
model_gbm.fit(X, ylog)
Because we transformed the the target, we want to change the default predict_function
to return a real y
value.
def predict_function(model, data):
return np.exp(model.predict(data))
Explainer
prints useful information, especially for resolving potential errors.
exp_svm = dx.Explainer(model_svm, data=X, y=y, predict_function=predict_function, label='svm')
exp_gbm = dx.Explainer(model_gbm, data=X, y=y, predict_function=predict_function, label='gbm')
model_performance
allows for easy model comparison.
pd.concat((exp_svm.model_performance().result, exp_gbm.model_performance().result))
Above functionalities are accessible from the Explainer
object through its methods.
Model-level and predict-level methods return a new unique object that contains the result
attribute (pandas.DataFrame
) and the plot
method.
predict_parts
and model_parts
have new type='shap_wrapper'
which uses the shap package to produce shap values explanations.
pp = exp_gbm.predict_parts(X.iloc[[1]], type='shap_wrapper', shap_explainer_type="TreeExplainer")
type(pp)
pp.plot()
pp.result # shap_values
mp = exp_gbm.model_parts(type='shap_wrapper', shap_explainer_type="TreeExplainer")
type(mp)
mp.plot()
mp.plot(plot_type='bar')
mp.result # shap_values
New model_diagnostics
method allows for Residual Diagnostics.
md_svm = exp_svm.model_diagnostics()
md_gbm = exp_gbm.model_diagnostics()
md_svm.plot(md_gbm, variable='age', yvariable='residuals', marker_size=5)