Explain multioutput predictive models with dalex

This notebook provides examples of working with multiclass classification and other multioutput algorithms, e.g. multioutput regression.

A natural example of such an algorithm is a multilayer perceptron neural network.

For a broad overview of the topic, see a comprehensive introduction in the scikit-learn package's documentation: 1.12. Multiclass and multioutput algorithms.

https://dalex.drwhy.ai/python

Imports

In [1]:
import dalex as dx

import numpy as np
import pandas as pd

from sklearn import datasets
from sklearn.ensemble import RandomForestRegressor
from sklearn.multioutput import MultiOutputRegressor
from lightgbm import LGBMClassifier, LGBMRegressor

import warnings
warnings.filterwarnings('ignore')
In [2]:
dx.__version__
Out[2]:
'1.4.1.9000'

Part 1: treating a multioutput model as multiple singleoutput models

One approach is to use each model's output separately, e.g. the predicted probability for a given class in multiclass classification problem.

Part 1A: Multiclass classification

We will use the iris dataset and the LGBMClassifier model for this example.

In [3]:
# data
X, y = datasets.load_iris(return_X_y=True, as_frame=True)

# model 
model = LGBMClassifier(n_estimators=25)
model.fit(X, y)

# model has 3 outputs
model.predict_proba(X).shape
Out[3]:
(150, 3)

Let's explain the classification for the first class 0. For that, we need to create a custom predict_function.

In [4]:
# custom (binary) predict function
pf_0 = lambda m, d: m.predict_proba(d)[:, 0]

# custom (binary) target values
y_0 = y == 0

# explainer
exp_0 = dx.Explainer(model, X, y_0, predict_function=pf_0, label="LGBMClassifier: class 0")
Preparation of a new explainer is initiated

  -> data              : 150 rows 4 cols
  -> target variable   : Parameter 'y' was a pandas.Series. Converted to a numpy.ndarray.
  -> target variable   : 150 values
  -> model_class       : lightgbm.sklearn.LGBMClassifier (default)
  -> label             : LGBMClassifier: class 0
  -> predict function  : <function <lambda> at 0x000002496BA75430> will be used
  -> predict function  : Accepts pandas.DataFrame and numpy.ndarray.
  -> predicted values  : min = 0.0309, mean = 0.338, max = 0.939
  -> model type        : classification will be used (default)
  -> residual function : difference between y and yhat (default)
  -> residuals         : min = -0.369, mean = -0.00495, max = 0.199
  -> model_info        : package lightgbm

A new explainer has been created!
In [5]:
exp_0.model_performance()
Out[5]:
recall precision f1 accuracy auc
LGBMClassifier: class 0 1.0 1.0 1.0 1.0 1.0
In [6]:
exp_0.model_parts().plot()
exp_0.model_profile().plot()