AI Ethics

Fairness Metrics for ML

Fairness metrics are quantitative tools used to detect and measure bias in machine learning models across different demographic groups.
No single metric can capture all definitions of fairness; practitioners must choose metrics based on the specific social and legal context of their application.
Achieving mathematical parity in one metric often necessitates a trade-off, potentially degrading another fairness metric or overall model accuracy.
Evaluating fairness requires a rigorous pipeline, including data auditing, metric selection, and post-processing interventions to mitigate identified disparities.

Why It Matters

Financial sector

In the financial sector, banks use fairness metrics to audit credit scoring models. If a model consistently denies loans to minority applicants at a higher rate than white applicants with similar credit histories, the bank is at risk of violating fair lending laws. By applying Equalized Odds, the bank can adjust its thresholds to ensure that the model is not disproportionately creating false negatives for protected groups.

Healthcare domain

In the healthcare domain, diagnostic AI models are audited for fairness to ensure they perform equally well across different ethnicities. For example, a skin cancer detection model might be trained mostly on images of light skin, leading to higher false negative rates for patients with darker skin. Fairness metrics help researchers identify these gaps, prompting them to collect more diverse training data to ensure equitable diagnostic accuracy for all patients.

Hiring and recruitment industry

In the hiring and recruitment industry, companies use automated resume screening tools to filter candidates. If these tools are not audited, they may learn to prioritize keywords associated with male-dominated educational backgrounds, effectively filtering out qualified women. Fairness metrics like Demographic Parity are used to audit these systems, ensuring that the pool of candidates presented to human recruiters represents a diverse cross-section of the applicant population.

How it Works

The Intuition of Fairness

Machine learning models are essentially pattern-matching engines. They learn from historical data, which often contains systemic biases. If a company has historically hired more men than women for technical roles, a model trained on this data will likely learn that "being male" is a predictor of success. Fairness metrics are the diagnostic tools we use to hold these models accountable. They allow us to move beyond "accuracy" and ask: "Is this model performing equally well for everyone, or is it systematically failing a specific group?"

The Conflict of Definitions

The central challenge in AI ethics is that "fairness" is a philosophical concept, not a mathematical one. There are over 20 different formal definitions of fairness, and they are often mutually exclusive. For example, if you enforce Demographic Parity (ensuring equal selection rates), you might inadvertently force the model to ignore valid predictive signals, which can lower the model's overall accuracy. Conversely, if you prioritize individual accuracy, you might perpetuate historical inequalities. Practitioners must navigate this "impossibility theorem" by selecting metrics that align with the specific moral and legal requirements of their domain.

The Lifecycle of Fairness Auditing

Fairness is not a one-time check; it is a lifecycle process. It begins with data collection, where we must identify if our training sets are representative of the real world. During the training phase, we might use "in-processing" techniques, such as adding a fairness constraint to the loss function, to penalize the model for making biased predictions. Finally, during the evaluation phase, we use fairness metrics to audit the model's performance on slices of data. If we find that the False Negative Rate is significantly higher for a minority group, we might apply "post-processing" techniques, such as adjusting the decision threshold for that specific group to ensure they are not unfairly denied a service.

Common Pitfalls

"Fairness means the model is 100% accurate." Accuracy is not fairness; a model can be highly accurate but still be biased against a specific group. Fairness requires looking at the distribution of errors, not just the total count of correct predictions.
"Removing protected attributes solves bias." Even if you remove race or gender from the dataset, the model can infer these attributes from "proxy variables" like zip codes or purchasing history. Bias is structural and persists even when explicit labels are deleted.
"There is one 'correct' fairness metric." Fairness is context-dependent, and choosing a metric is a value judgment. You cannot mathematically satisfy all fairness definitions simultaneously, so you must choose the one that best serves the ethical goals of your specific project.
"Fairness is only a data problem." While data quality is crucial, bias can also be introduced by the model architecture, the objective function, or the way the model is deployed. A holistic approach is required, covering the entire pipeline from data collection to post-deployment monitoring.

Sample Code

Python

import numpy as np
from sklearn.metrics import confusion_matrix

# Simulated model outputs: 0=denied, 1=approved
y_true = np.array([1, 0, 1, 1, 0, 1, 0, 0])
y_pred = np.array([1, 0, 1, 0, 0, 1, 1, 0])
groups = np.array([0, 0, 0, 0, 1, 1, 1, 1]) # 0: Group A, 1: Group B

def calculate_tpr(y_true, y_pred, group_mask):
    cm = confusion_matrix(y_true[group_mask], y_pred[group_mask])
    tn, fp, fn, tp = cm.ravel()
    return tp / (tp + fn)

# Calculate TPR for both groups
tpr_a = calculate_tpr(y_true, y_pred, groups == 0)
tpr_b = calculate_tpr(y_true, y_pred, groups == 1)

print(f"TPR Group A: {tpr_a:.2f}, TPR Group B: {tpr_b:.2f}")
# Output: TPR Group A: 0.67, TPR Group B: 0.50
# Interpretation: The model is less accurate at identifying positive cases for Group B.

Key Terms

Protected Attribute

A characteristic of an individual, such as race, gender, age, or disability status, that is protected by law or ethical standards from being used as a basis for discrimination. In ML, these are the variables we monitor to ensure the model does not produce disparate outcomes.

Disparate Impact

A situation where a model’s outcomes disproportionately affect one group more negatively than another, even if the model does not explicitly use protected attributes. This is often measured by comparing the selection rates between a privileged group and an unprivileged group.

Calibration

A property of a model where the predicted probability of an outcome matches the actual observed frequency of that outcome. If a model predicts a 70% chance of loan default, a calibrated model will see 70 out of 100 such individuals actually default.

Equalized Odds

A fairness criterion requiring that both the True Positive Rate and the False Positive Rate are equal across all demographic groups. This ensures that the model is equally accurate at identifying positive and negative instances for everyone, regardless of their protected attribute.

Demographic Parity

A condition where the probability of a positive outcome is the same for all groups, regardless of their actual underlying distribution in the data. This metric focuses on the final output rather than the accuracy of the prediction relative to the ground truth.

Bias Mitigation

The process of applying algorithmic or data-centric techniques to reduce or eliminate fairness disparities. This can occur at the pre-processing stage (data cleaning), in-processing (changing the objective function), or post-processing (adjusting decision thresholds).