tensorflow + dalex = :)

In [1]:
import warnings
warnings.filterwarnings('ignore')

read data

In [2]:
import pandas as pd
pd.__version__
Out[2]:
'1.2.4'
In [3]:
data = pd.read_csv("https://raw.githubusercontent.com/pbiecek/xai-happiness/main/happiness.csv", index_col=0)
data.head()
Out[3]:
score gdp_per_capita social_support healthy_life_expectancy freedom_to_make_life_choices generosity perceptions_of_corruption
Afghanistan 3.203 0.350 0.517 0.361 0.000 0.158 0.025
Albania 4.719 0.947 0.848 0.874 0.383 0.178 0.027
Algeria 5.211 1.002 1.160 0.785 0.086 0.073 0.114
Argentina 6.086 1.092 1.432 0.881 0.471 0.066 0.050
Armenia 4.559 0.850 1.055 0.815 0.283 0.095 0.064
In [4]:
X, y = data.drop('score', axis=1), data.score
n, p = X.shape

create a model

In [5]:
import tensorflow as tf
tf.__version__
Out[5]:
'2.5.0'
In [6]:
tf.random.set_seed(11)

normalizer  = tf.keras.layers.experimental.preprocessing.Normalization(input_shape=[p,])
normalizer.adapt(X.to_numpy())

model = tf.keras.Sequential([
    normalizer,
    tf.keras.Input(shape=(p,)),
    tf.keras.layers.Dense(p*2, activation='relu'),
    tf.keras.layers.Dense(p*3, activation='relu'),
    tf.keras.layers.Dense(p*2, activation='relu'),
    tf.keras.layers.Dense(p, activation='relu'),
    tf.keras.layers.Dense(1, activation='linear')
])

model.compile(
    optimizer=tf.keras.optimizers.Adam(0.001),
    loss=tf.keras.losses.mae
)
WARNING:tensorflow:Please add `keras.layers.InputLayer` instead of `keras.Input` to Sequential model. `keras.Input` is intended to be used by Functional model.
In [7]:
model.fit(X, y, batch_size=int(n/10), epochs=2000, verbose=False)
Out[7]:
<tensorflow.python.keras.callbacks.History at 0x1992703f460>

explain the model

Explainer initialization communicates useful information

In [8]:
import dalex as dx
dx.__version__
Out[8]:
'1.2.0'
In [9]:
explainer = dx.Explainer(model, X, y, label='happiness')
Preparation of a new explainer is initiated

  -> data              : 156 rows 6 cols
  -> target variable   : Parameter 'y' was a pandas.Series. Converted to a numpy.ndarray.
  -> target variable   : 156 values
  -> model_class       : tensorflow.python.keras.engine.sequential.Sequential (default)
  -> label             : happiness
  -> predict function  : <function yhat_tf_regression at 0x0000019930069A60> will be used (default)
  -> predict function  : Accepts pandas.DataFrame and numpy.ndarray.
  -> predicted values  : min = 2.86, mean = 5.41, max = 7.96
  -> model type        : regression will be used (default)
  -> residual function : difference between y and yhat (default)
  -> residuals         : min = -0.609, mean = -0.00403, max = 1.09
  -> model_info        : package tensorflow

A new explainer has been created!

model level explanations

firstly, assess the model fit to training data

In [10]:
explainer.model_performance()
Out[10]:
mse rmse r2 mae mad
happiness 0.025378 0.159304 0.979386 0.077048 0.032524

which features are the most important?

In [11]:
explainer.model_parts().plot()