Tutorial: fairness in regression

In this short tutorial, we show how to check if the regression model discriminates a particular subgroup using the dalex package.

This approach is experimental and we are grateful for all the feedback. It was implemented according to Steinberg, D., et al. (2020).

This notebook aims to show how to detect the bias in regression models. It won't cover the fairness concepts and interpretation of the plots in details. For starters it is best to get familiar with our fairness in classification materials:

In [1]:
import pandas as pd 
import numpy as np
{"pd": pd.__version__, "np": np.__version__}
Out[1]:
{'pd': '1.2.4', 'np': '1.19.5'}

Data

We use the Communitties and Crime data from the paper and aim to predict the ViolentCrimesPerPop variable (total number of violent crimes per 100K population).

The protected attribute is the racepctblack value (part of the population identifying as black), which is the same one picked by the paper's authors.

In [2]:
data = pd.read_csv("http://archive.ics.uci.edu/ml/machine-learning-databases/communities/communities.data", header=None, na_values=["?"])
from urllib.request import urlopen
names = urlopen("http://archive.ics.uci.edu/ml/machine-learning-databases/communities/communities.names")
columns = [line.split(b' ')[1].decode("utf-8") for line in names if line.startswith(b'@attribute')]
data.columns = columns
data = data.dropna(axis = 1)
data = data.iloc[:, 3:]
data.head()
Out[2]:
population householdsize racepctblack racePctWhite racePctAsian racePctHisp agePct12t21 agePct12t29 agePct16t24 agePct65up ... PctForeignBorn PctBornSameState PctSameHouse85 PctSameCity85 PctSameState85 LandArea PopDens PctUsePubTrans LemasPctOfficDrugUn ViolentCrimesPerPop
0 0.19 0.33 0.02 0.90 0.12 0.17 0.34 0.47 0.29 0.32 ... 0.12 0.42 0.50 0.51 0.64 0.12 0.26 0.20 0.32 0.20
1 0.00 0.16 0.12 0.74 0.45 0.07 0.26 0.59 0.35 0.27 ... 0.21 0.50 0.34 0.60 0.52 0.02 0.12 0.45 0.00 0.67
2 0.00 0.42 0.49 0.56 0.17 0.04 0.39 0.47 0.28 0.32 ... 0.14 0.49 0.54 0.67 0.56 0.01 0.21 0.02 0.00 0.43
3 0.04 0.77 1.00 0.08 0.12 0.10 0.51 0.50 0.34 0.21 ... 0.19 0.30 0.73 0.64 0.65 0.02 0.39 0.28 0.00 0.12
4 0.01 0.55 0.02 0.95 0.09 0.05 0.38 0.38 0.23 0.36 ... 0.11 0.72 0.64 0.61 0.53 0.04 0.09 0.02 0.00 0.03

5 rows × 100 columns

In [3]:
X = data.drop('ViolentCrimesPerPop', axis=1)
y = data.ViolentCrimesPerPop

Models

We make two regressor models: a simple and interpretable Decision Tree and a more complex and accurate Gradient Boosting.

In [4]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor, RandomForestRegressor
from sklearn.tree import DecisionTreeRegressor
In [5]:
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
In [6]:
model = GradientBoostingRegressor()
model.fit(X_train, y_train)

model2 = DecisionTreeRegressor()
model2.fit(X_train, y_train)
Out[6]:
DecisionTreeRegressor()

Explainers

In the next step we make the Explainer objects using dalex.

In [7]:
import dalex as dx
print(dx.__version__)
1.4.0
In [8]:
exp = dx.Explainer(model, X_test, y_test, verbose=False)
exp2 = dx.Explainer(model2, X_test, y_test, verbose=False)
In [9]:
exp.model_performance().result.append(exp2.model_performance().result)
Out[9]:
mse rmse r2 mae mad
GradientBoostingRegressor 0.017067 0.130640 0.649006 0.088551 0.057761
DecisionTreeRegressor 0.035339 0.187987 0.273218 0.124950 0.070000

Fairness

Having Explainers, we are able to assess models' fairness. To make sure that the models are fair, we will be checking three independence criteria. These are:

  • independence: R⊥A
  • separation: R⊥A ∣ Y
  • sufficiency: Y⊥A ∣ R

Where:

  • A - protected group
  • Y - target
  • R - model's prediction

In the approach described in Steinberg, D., et al. (2020), the authors propose a way of checking this independence.

The method implemented in the dalex package is called Direct Density Ratio Estimation.

In [10]:
protected = np.where(X_test.racepctblack >= 0.5, 'majority_black', "else")
privileged = 'else'
In [11]:
fobject = exp.model_fairness(protected, privileged)
fobject2 = exp2.model_fairness(protected, privileged)
In [12]:
fobject.fairness_check()
Bias detected in 2 metrics: independence, separation

Conclusion: your model is not fair because 2 or more criteria exceeded acceptable limits set by epsilon.

Ratios of metrics, based on 'else'. Parameter 'epsilon' was set to 0.8 and therefore metrics should be within (0.8, 1.25)
                independence  separation  sufficiency
subgroup                                             
majority_black     10.559471     3.29976     1.087851
In [13]:
fobject2.fairness_check()
Bias detected in 3 metrics: independence, separation, sufficiency

Conclusion: your model is not fair because 2 or more criteria exceeded acceptable limits set by epsilon.

Ratios of metrics, based on 'else'. Parameter 'epsilon' was set to 0.8 and therefore metrics should be within (0.8, 1.25)
                independence  separation  sufficiency
subgroup                                             
majority_black       2.80224    1.529388     1.723974

The models are biased!

Decision Tree model violated 3 criteria while Gradient Boosting violated only 2. We can plot the fairness check in the same way as for classification.

In [14]:
fobject2.plot()