tensorflow + dalex = :)

In [1]:
import warnings
warnings.filterwarnings('ignore')

read data

In [2]:
import pandas as pd
pd.__version__
Out[2]:
'1.1.2'
In [3]:
data = pd.read_csv("https://raw.githubusercontent.com/pbiecek/xai-happiness/main/happiness.csv", index_col=0)
data.head()
Out[3]:
score gdp_per_capita social_support healthy_life_expectancy freedom_to_make_life_choices generosity perceptions_of_corruption
Afghanistan 3.203 0.350 0.517 0.361 0.000 0.158 0.025
Albania 4.719 0.947 0.848 0.874 0.383 0.178 0.027
Algeria 5.211 1.002 1.160 0.785 0.086 0.073 0.114
Argentina 6.086 1.092 1.432 0.881 0.471 0.066 0.050
Armenia 4.559 0.850 1.055 0.815 0.283 0.095 0.064
In [4]:
X, y = data.drop('score', axis=1), data.score
n, p = X.shape

create a model

In [5]:
import tensorflow as tf
tf.__version__
Out[5]:
'2.3.0'
In [6]:
tf.random.set_seed(11)

normalizer  = tf.keras.layers.experimental.preprocessing.Normalization(input_shape=[p,])
normalizer.adapt(X.to_numpy())

model = tf.keras.Sequential([
    normalizer,
    tf.keras.Input(shape=(p,)),
    tf.keras.layers.Dense(p*2, activation='relu'),
    tf.keras.layers.Dense(p*3, activation='relu'),
    tf.keras.layers.Dense(p*2, activation='relu'),
    tf.keras.layers.Dense(p, activation='relu'),
    tf.keras.layers.Dense(1, activation='relu')
])

model.compile(
    optimizer=tf.keras.optimizers.Adam(0.001),
    loss=tf.keras.losses.mae
)
In [7]:
model.fit(X, y, batch_size=int(n/10), epochs=2000, verbose=False)
Out[7]:
<tensorflow.python.keras.callbacks.History at 0x228d12d9610>

explain the model

Explainer initialization communicates useful information

In [8]:
import dalex as dx
dx.__version__
Out[8]:
'0.2.2.9000'
In [9]:
explainer = dx.Explainer(model, X, y, label='happiness')
Preparation of a new explainer is initiated

  -> data              : 156 rows 6 cols
  -> target variable   : Argument 'y' was a pandas.Series. Converted to a numpy.ndarray.
  -> target variable   : 156 values
  -> model_class       : tensorflow.python.keras.engine.sequential.Sequential (default)
  -> label             : happiness
  -> predict function  : <function yhat_tf_regression at 0x00000228D28E1430> will be used (default)
  -> predict function  : accepts pandas.DataFrame and numpy.ndarray
  -> predicted values  : min = 2.86, mean = 5.42, max = 7.73
  -> model type        : regression will be used (default)
  -> residual function : difference between y and yhat (default)
  -> residuals         : min = -0.616, mean = -0.0103, max = 0.555
  -> model_info        : package tensorflow

A new explainer has been created!

model level explanations

firstly, assess the performance

In [10]:
explainer.model_performance()
Out[10]:
mse rmse r2 mae mad
happiness 0.017569 0.132549 0.985729 0.072329 0.03636

which features are the most important?

In [11]:
explainer.model_parts().plot()