import dalex as dx
import pandas as pd
import numpy as np
from sklearn.neural_network import MLPClassifier
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.impute import SimpleImputer
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
import warnings
warnings.filterwarnings('ignore')
dx.__version__
First, divide the data into variables X
and a target variable y
.
data = dx.datasets.load_titanic()
X = data.drop(columns='survived')
y = data.survived
data.head(10)
numerical_transformer
pipeline:
numerical_features
: choose numerical features to transformcategorical_transformer
pipeline:
categorical_features
: choose categorical features to transform 'missing'
stringaggregate those two pipelines into a preprocessor
using ColumnTransformer
classifier
model using MLPClassifier
- it has 3 hidden layers with sizes 150, 100, 50 respectivelyclf
pipeline model, which combines the preprocessor
with the basic classifier
model numerical_features = ['age', 'fare', 'sibsp', 'parch']
numerical_transformer = Pipeline(
steps=[
('imputer', SimpleImputer(strategy='median')),
('scaler', StandardScaler())
]
)
categorical_features = ['gender', 'class', 'embarked']
categorical_transformer = Pipeline(
steps=[
('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
('onehot', OneHotEncoder(handle_unknown='ignore'))
]
)
preprocessor = ColumnTransformer(
transformers=[
('num', numerical_transformer, numerical_features),
('cat', categorical_transformer, categorical_features)
]
)
classifier = MLPClassifier(hidden_layer_sizes=(150,100,50), max_iter=500, random_state=0)
clf = Pipeline(steps=[('preprocessor', preprocessor),
('classifier', classifier)])
clf.fit(X, y)
exp = dx.Explainer(clf, X, y)
Above functionalities are accessible from the Explainer
object through its methods.
Model-level and predict-level methods return a new unique object that contains the result
attribute (pandas.DataFrame
) and the plot
method.
This function is nothing but normal model prediction, however it uses Explainer
interface.
Let's create two example persons for this tutorial.
john = pd.DataFrame({'gender': ['male'],
'age': [25],
'class': ['1st'],
'embarked': ['Southampton'],
'fare': [72],
'sibsp': [0],
'parch': 0},
index = ['John'])
mary = pd.DataFrame({'gender': ['female'],
'age': [35],
'class': ['3rd'],
'embarked': ['Cherbourg'],
'fare': [25],
'sibsp': [0],
'parch': [0]},
index = ['Mary'])
You can make a prediction on many samples at the same time
exp.predict(X)[0:10]
As well as on only one instance. However, the only accepted format is pandas.DataFrame
.
Prediction of survival for John.
exp.predict(john)
Prediction of survival for Mary.
exp.predict(mary)
'break_down'
'break_down_interactions'
'shap'
This function calculates Variable Attributions as Break Down, iBreakDown or Shapley Values explanations.
Model prediction is decomposed into parts that are attributed for particular variables.
bd_john = exp.predict_parts(john, type='break_down', label=john.index[0])
bd_interactions_john = exp.predict_parts(john, type='break_down_interactions', label="John+")
sh_mary = exp.predict_parts(mary, type='shap', B = 10, label=mary.index[0])
bd_john.result
bd_john.plot(bd_interactions_john)