Protected Attribute

Also known as: Sensitive Attribute, Protected Characteristic, Protected Class

Protected Attribute
A protected attribute is a characteristic — such as race, sex, age, or disability — that laws or fairness policies forbid using as a basis for discriminatory decisions. In machine learning, fairness metrics measure whether model outcomes differ across groups defined by these attributes.

A protected attribute is a personal characteristic — such as race, gender, or age — that anti-discrimination law forbids using as a basis for unfair treatment in automated decisions.

What It Is

Fairness metrics like demographic parity and equalized odds share a prerequisite: you need to define whose outcomes you’re comparing. Protected attributes answer that question. They mark the characteristics — race, sex, age, disability, religion, national origin — that society and law have decided should not determine whether someone gets hired, approved for a loan, or flagged by a risk algorithm.

The concept originates in civil rights law. According to EEOC, US federal law recognizes race, color, religion, sex, national origin, age (40+), disability, and genetic information as protected classes. The EU has similar categories, and the EU AI Act (2024) introduced additional requirements for handling protected attributes in high-risk AI systems. The specific list varies by jurisdiction and industry.

Think of a protected attribute as the “do not sort by this” label on a data column. In a hiring algorithm that screens resumes, the protected attribute (say, gender) is the variable fairness auditors examine. They ask: does the model approve candidates at different rates depending on which group they belong to? According to Fairlearn Docs, fairness metrics formally compare outcomes conditioned on the protected attribute. Demographic parity checks whether the selection rate is roughly equal across groups. Equalized odds goes further, checking whether error rates — false positives and false negatives — are balanced across groups too.

The tricky part is that protected attributes don’t need to appear in a dataset to influence predictions. According to Chouldechova, models can learn protected group membership indirectly through correlated features. A zip code can serve as a proxy for race. A first name can correlate with ethnicity. Browsing history can signal age. This is called proxy discrimination, and it means that simply removing a protected attribute column from training data doesn’t guarantee the model ignores it.

This is why comparing fairness metrics matters. When you evaluate demographic parity versus equalized odds versus calibration, each metric defines a different relationship between the model’s predictions and the protected attribute — equal selection rates, equal error rates, or equal predictive accuracy across groups.

How It’s Used in Practice

When teams evaluate an ML model for fairness, the first step is identifying which attributes count as protected in their deployment context. A lending model in the US must account for race, sex, and age under federal equal credit law. A hiring tool deployed in the EU must consider categories defined by member-state legislation. A healthcare algorithm might need to examine disability status and genetic information.

Once identified, protected attributes feed directly into fairness metric calculations. If you’re comparing demographic parity against equalized odds — as the core fairness metrics comparison covers — both metrics split predictions by the same protected attribute groups. The protected attribute is the “group by” variable; the metric tells you what type of equality you’re measuring. Teams typically run multiple metrics across all relevant protected attributes to build a complete picture.

Pro Tip: Don’t assume your protected attribute list is universal. US employment law protects different categories than EU consumer credit regulation, and state-level laws often add categories that federal law doesn’t cover. Before running any fairness audit, confirm the legally relevant attributes for your specific use case, industry, and jurisdiction. Getting this wrong means your fairness metrics answer the wrong question.

When to Use / When Not

ScenarioUseAvoid
Running fairness audits on a hiring or lending model
Selecting features for a weather prediction model
Comparing demographic parity scores across racial groups
Building an internal document search engine
Evaluating bias in a criminal risk assessment tool
Training a recommendation model with no human impact

Common Misconception

Myth: Removing a protected attribute from the training data eliminates bias. Reality: Models learn proxies. According to Chouldechova, correlated features like zip code, first name, or purchasing patterns often reconstruct protected group membership indirectly. A model trained without a “race” column can still produce racially disparate outcomes if it uses features that correlate with race. Fairness requires measuring outcomes across protected groups, not just hiding the column.

One Sentence to Remember

Protected attributes are the groups you check when asking “is this model fair?” — without defining them first, metrics like demographic parity and equalized odds have nothing to measure.

FAQ

Q: What are the most common protected attributes in the US? A: According to EEOC, federal law protects race, color, religion, sex, national origin, age (40 and older), disability, and genetic information. State laws may add categories like sexual orientation.

Q: How do protected attributes connect to fairness metrics? A: Fairness metrics split model predictions by protected attribute groups and compare outcomes. Demographic parity checks equal selection rates; equalized odds checks equal error rates across those groups.

Q: Can a model be biased even without protected attributes in the training data? A: Yes. Models learn proxy features — like zip code correlating with race — that reconstruct protected group membership indirectly. Removing the column doesn’t remove the bias pattern.

Sources

Expert Takes

Protected attributes are the conditioning variable in every group fairness formula. Demographic parity checks whether the probability of a positive outcome is equal across groups defined by the protected attribute. Without specifying which attribute, the fairness constraint is undefined. The choice of which attributes to protect is a policy decision, not a statistical one — the math only tells you whether the constraint holds once the groups are defined.

Start any fairness audit with a protected attribute inventory. List every attribute your jurisdiction requires, then check your dataset: do you actually have that data? Many teams discover they can’t compute equalized odds because they never collected the demographic fields needed. If you can’t measure it, you can’t fix it. Build the attribute registry before building the model, and document which metrics map to which groups.

Regulations are converging. The EU AI Act mandates bias testing on protected attributes for high-risk systems. US agencies are issuing similar guidance. Companies that skip protected attribute audits aren’t just accepting technical risk — they’re accepting legal exposure. The teams that build protected attribute tracking into their model lifecycle now will spend less time scrambling when enforcement arrives.

Who decides which attributes deserve protection? Law provides a starting list, but communities define what matters. Age discrimination past a certain threshold is legally recognized in the US; ageism against younger workers is not. Disability is protected; socioeconomic status usually isn’t. Every fairness audit carries an invisible boundary: the groups it counts and the groups it doesn’t. The attributes you choose to measure reveal the harms you’ve chosen to see.