tensorflow + dalex = :)

In [1]:
import warnings
warnings.filterwarnings('ignore')

read data

In [2]:
import pandas as pd
pd.__version__
Out[2]:
'1.2.0'
In [3]:
data = pd.read_csv("https://raw.githubusercontent.com/pbiecek/xai-happiness/main/happiness.csv", index_col=0)
data.head()
Out[3]:
score gdp_per_capita social_support healthy_life_expectancy freedom_to_make_life_choices generosity perceptions_of_corruption
Afghanistan 3.203 0.350 0.517 0.361 0.000 0.158 0.025
Albania 4.719 0.947 0.848 0.874 0.383 0.178 0.027
Algeria 5.211 1.002 1.160 0.785 0.086 0.073 0.114
Argentina 6.086 1.092 1.432 0.881 0.471 0.066 0.050
Armenia 4.559 0.850 1.055 0.815 0.283 0.095 0.064
In [4]:
X, y = data.drop('score', axis=1), data.score
n, p = X.shape

create a model

In [5]:
import tensorflow as tf
tf.__version__
Out[5]:
'2.3.0'
In [6]:
tf.random.set_seed(11)

normalizer  = tf.keras.layers.experimental.preprocessing.Normalization(input_shape=[p,])
normalizer.adapt(X.to_numpy())

model = tf.keras.Sequential([
    normalizer,
    tf.keras.Input(shape=(p,)),
    tf.keras.layers.Dense(p*2, activation='relu'),
    tf.keras.layers.Dense(p*3, activation='relu'),
    tf.keras.layers.Dense(p*2, activation='relu'),
    tf.keras.layers.Dense(p, activation='relu'),
    tf.keras.layers.Dense(1, activation='linear')
])

model.compile(
    optimizer=tf.keras.optimizers.Adam(0.001),
    loss=tf.keras.losses.mae
)
In [7]:
model.fit(X, y, batch_size=int(n/10), epochs=2000, verbose=False)
Out[7]:
<tensorflow.python.keras.callbacks.History at 0x234feb059a0>

explain the model

Explainer initialization communicates useful information

In [8]:
import dalex as dx
dx.__version__
Out[8]:
'1.0.0'
In [9]:
explainer = dx.Explainer(model, X, y, label='happiness')
Preparation of a new explainer is initiated

  -> data              : 156 rows 6 cols
  -> target variable   : Parameter 'y' was a pandas.Series. Converted to a numpy.ndarray.
  -> target variable   : 156 values
  -> model_class       : tensorflow.python.keras.engine.sequential.Sequential (default)
  -> label             : happiness
  -> predict function  : <function yhat_tf_regression at 0x000002348618FF70> will be used (default)
  -> predict function  : Accepts pandas.DataFrame and numpy.ndarray.
  -> predicted values  : min = 2.89, mean = 5.38, max = 7.79
  -> model type        : regression will be used (default)
  -> residual function : difference between y and yhat (default)
  -> residuals         : min = -0.654, mean = 0.024, max = 1.08
  -> model_info        : package tensorflow

A new explainer has been created!

model level explanations

firstly, assess the model fit to training data

In [10]:
explainer.model_performance()
Out[10]:
mse rmse r2 mae mad
happiness 0.025915 0.160981 0.97895 0.086351 0.043158

which features are the most important?

In [11]:
explainer.model_parts().plot()