Tutorial: fairness in regression¶
In this short tutorial, we show how to check if the regression model discriminates a particular subgroup using the dalex
package.
This approach is experimental and we are grateful for all the feedback. It was implemented according to Steinberg, D., et al. (2020).
This notebook aims to show how to detect the bias in regression models. It won't cover the fairness concepts and interpretation of the plots in details. For starters it is best to get familiar with our fairness in classification materials:
import pandas as pd
import numpy as np
import plotly
plotly.offline.init_notebook_mode()
Data¶
We use the Communitties and Crime data from the paper and aim to predict the ViolentCrimesPerPop variable (total number of violent crimes per 100K population).
The protected attribute is the racepctblack value (part of the population identifying as black), which is the same one picked by the paper's authors.
data = pd.read_csv("http://archive.ics.uci.edu/ml/machine-learning-databases/communities/communities.data", header=None, na_values=["?"])
from urllib.request import urlopen
names = urlopen("http://archive.ics.uci.edu/ml/machine-learning-databases/communities/communities.names")
columns = [line.split(b' ')[1].decode("utf-8") for line in names if line.startswith(b'@attribute')]
data.columns = columns
data = data.dropna(axis = 1)
data = data.iloc[:, 3:]
data.head()
X = data.drop('ViolentCrimesPerPop', axis=1)
y = data.ViolentCrimesPerPop
Models¶
We make two regressor models: a simple and interpretable Decision Tree
and a more complex and accurate Gradient Boosting
.
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.tree import DecisionTreeRegressor
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
model = GradientBoostingRegressor()
model.fit(X_train, y_train)
model2 = DecisionTreeRegressor()
model2.fit(X_train, y_train)
Explainers¶
In the next step we make the Explainer
objects using dalex
.
import dalex as dx
print(dx.__version__)
exp = dx.Explainer(model, X_test, y_test, verbose=False)
exp2 = dx.Explainer(model2, X_test, y_test, verbose=False)
pd.concat([exp2.model_performance().result, exp.model_performance().result])
Fairness¶
Having Explainers
, we are able to assess models' fairness. To make sure that the models are fair, we will be checking three independence criteria. These are:
- independence: R⊥A
- separation: R⊥A ∣ Y
- sufficiency: Y⊥A ∣ R
Where:
- A - protected group
- Y - target
- R - model's prediction
In the approach described in Steinberg, D., et al. (2020), the authors propose a way of checking this independence.
The method implemented in the dalex
package is called Direct Density Ratio Estimation.
protected = np.where(X_test.racepctblack >= 0.5, 'majority_black', "else")
privileged = 'else'
fobject = exp.model_fairness(protected, privileged)
fobject2 = exp2.model_fairness(protected, privileged)
fobject.fairness_check()
fobject2.fairness_check()
The models are biased!¶
Decision Tree
model violated 3 criteria while Gradient Boosting
violated only 2. We can plot the fairness check in the same way as for classification.
fobject2.plot()
One can easily plot the models together.
fobject2.plot(fobject)
We plot the models' output using the density
type.
fobject.plot(fobject2, type='density')
Moreover, the method will acknowledge that there is no discrimination. To show this, let's pick another protected
group. This time we choose racePctAsian.
protected = np.where(X_test.racePctAsian >= 0.5, 'majority_asian', "else")
privileged = 'else'
fobject = exp.model_fairness(protected, privileged)
fobject2 = exp2.model_fairness(protected, privileged)
fobject2.plot(fobject)
We can see that there is no discrimination towards the Asian community in this model (based on this data).
Summary¶
The new functionality allows the user to check the fairness of regression model. However, it should be noted that this is an experimental approach and the output of methods should be treated as a suggestion rather than a definite result.
Plots¶
This package uses plotly to render the plots:
- Install extentions to use
plotly
in JupyterLab: Getting Started Troubleshooting - Use
show=False
parameter inplot
method to returnplotly Figure
object - It is possible to edit the figures and save them
Resources - https://dalex.drwhy.ai/python¶
Introduction to the
dalex
package: Titanic: tutorial and examplesKey features explained: FIFA20: explain default vs tuned model with dalex
How to use dalex with: xgboost, tensorflow, h2o (feat. autokeras, catboost, lightgbm)
More explanations: residuals, shap, lime
Introduction to the Fairness module in dalex
Introduction to the Aspect module in dalex
Introduction to Arena: interactive dashboard for model exploration
Code in the form of jupyter notebook
Changelog: NEWS
Theoretical introduction to the plots: Explanatory Model Analysis: Explore, Explain, and Examine Predictive Models