Demographic Parity

Also known as: statistical parity, independence criterion, group fairness

Demographic Parity
A fairness metric requiring that a machine learning model’s positive prediction rate is equal across all demographic groups, regardless of whether those groups differ in actual outcomes.

Demographic parity is a fairness metric that requires a machine learning model to produce positive predictions at the same rate for every demographic group, regardless of actual outcome differences between those groups.

What It Is

When a hiring algorithm approves 60% of applicants from one group but only 30% from another, something is off—or at least worth investigating. Demographic parity gives you a direct way to check: it asks whether your model’s positive prediction rate is the same across all demographic groups. If it isn’t, the model may be discriminating, even if nobody designed it that way. For teams building or evaluating bias and fairness metrics, demographic parity is typically the first test in the toolkit.

Think of it like a coin-flip test applied to model outputs. If your model decides who gets approved for a loan, demographic parity asks: does every group—defined by protected attributes like gender, race, or age—get approved at roughly the same rate? The formal requirement is direct: the probability of a positive prediction should be identical regardless of group membership. According to Fairlearn Docs, this is expressed as P(Ŷ=1|A=a) = P(Ŷ=1) for all groups a, where Ŷ is the prediction and A is the protected attribute.

This simplicity makes demographic parity the most accessible fairness metric available. You don’t need ground-truth labels to compute it—just the model’s predictions and group membership data. That low barrier to entry is why it appears so frequently in fairness audits and regulatory discussions.

But the simplicity comes with a real trade-off. Demographic parity ignores whether groups actually differ in the outcome being predicted. Consider a medical screening tool that correctly flags one population at higher rates because that population genuinely has higher prevalence of a condition. Forcing equal prediction rates would mean either over-screening healthy patients or missing at-risk ones. This tension points to a deeper mathematical constraint: according to Fairlearn Docs, demographic parity can conflict with other fairness metrics like equalized odds when base rates differ between groups—a result formalized in the impossibility theorem.

In practice, this means demographic parity works best as a diagnostic signal rather than a final goal. When prediction rates diverge sharply between groups, that’s a flag worth investigating. But achieving perfect demographic parity isn’t always the right objective, because forcing equal outputs can reduce the model’s usefulness for everyone.

How It’s Used in Practice

The most common place you’ll encounter demographic parity is during model auditing. Before deploying a credit scoring model, a hiring tool, or a content recommendation system, teams check whether positive prediction rates differ across protected groups. According to Fairlearn Docs, tools like Fairlearn provide dedicated functions—demographic_parity_difference() and demographic_parity_ratio()—that calculate the gap between group prediction rates in a single function call.

Regulators and compliance teams increasingly treat demographic parity as a first-pass filter. If a model’s approval rate for one demographic group is substantially lower than for another, that triggers deeper investigation—even if the difference turns out to be justified by legitimate factors. The four-fifths rule used in US employment law follows a similar logic: if a selection rate for any group falls below 80% of the highest group’s rate, the employer must justify the disparity.

Pro Tip: Use demographic parity as your “smoke detector,” not your final verdict. If the numbers look unequal, dig deeper with metrics like equalized odds or calibration before deciding whether to adjust the model. A failing demographic parity check tells you something is worth examining—it doesn’t tell you what to do about it.

When to Use / When Not

ScenarioUseAvoid
Initial fairness screening before model deployment
Regulatory compliance check for lending or hiring decisions
Medical diagnosis where base rates genuinely differ across groups
Risk scoring where ground-truth outcome differences are well-documented
Comparing fairness across multiple candidate models
Final fairness criterion when accuracy trade-offs matter

Common Misconception

Myth: If a model satisfies demographic parity, it is fair. Reality: Demographic parity only checks whether prediction rates are equal across groups. It says nothing about whether those predictions are accurate. A model can satisfy demographic parity while making worse predictions for one group—just at the same rate. True fairness usually requires examining multiple metrics together, because no single metric captures every dimension of fairness.

One Sentence to Remember

Demographic parity checks whether your model treats groups equally in terms of prediction rates, but equal rates don’t guarantee equal accuracy—so use it as your starting point for fairness auditing, not your finish line.

FAQ

Q: How is demographic parity different from equalized odds? A: Demographic parity compares overall positive prediction rates across groups. Equalized odds goes further by requiring equal true positive and false positive rates, accounting for actual outcomes rather than just predictions.

Q: Can a model satisfy demographic parity and still be unfair? A: Yes. A model might predict at equal rates across groups while being less accurate for one group, producing more false positives or false negatives for that specific population.

Q: Is demographic parity required by law? A: Not directly, but related concepts appear in anti-discrimination law. The four-fifths rule in US employment law uses a similar ratio-based approach to flag potentially discriminatory selection processes.

Sources

Expert Takes

Demographic parity constrains marginal prediction rates across protected groups—it operates on model outputs alone, without referencing ground-truth labels. The impossibility theorem proves you cannot simultaneously satisfy demographic parity and equalized odds when base rates differ between groups. This mathematical tension is not a flaw in the metric but a fundamental property of fairness formalization. Choosing a metric always means choosing which type of error you tolerate.

When auditing a model for fairness, demographic parity is the first metric most teams compute because it needs only predictions and group labels—no ground-truth outcomes. Run the check early in your pipeline, before deployment gates. If the ratio drops below your acceptable threshold, that is your signal to investigate further, not to retrain blindly. Pair it with equalized odds and calibration checks for a complete picture.

Companies that skip demographic parity checks during model development are placing a bet—that no regulator, journalist, or affected user will ever run the numbers. That bet gets worse every quarter. Fairness auditing is moving from academic exercise to procurement requirement. Teams that build demographic parity into their standard workflow today will have a structural advantage when compliance mandates arrive.

The tension around demographic parity is instructive. The metric asks whether outcomes look equal, but “looking equal” and “being fair” are different questions entirely. A model that satisfies demographic parity might be masking real disparities within subgroups, while one that fails might be reflecting legitimate differences. The choice of which metric to enforce is a values decision—one that engineers rarely acknowledge they are making.