Dataset Bias
Dataset bias is a systematic skew in the data used to train a model, causing it to learn and amplify unfair or inaccurate patterns.
It shows up when the training data over-represents some groups, under-samples others, or measures the wrong thing. The model then carries those distortions into every prediction it makes. Also known as: Data Bias, Training Data Bias.
What this topic covers
- Foundations — Start here to understand what dataset bias really is: how skews in selection, representation, and measurement quietly enter training data, and why a model trained on it learns the distortion as if it were signal.
- Implementation — These guides walk through detecting and mitigating bias in practice: auditing your data for skew, applying debiasing techniques during collection and curation, and weighing the trade-offs between fairness, accuracy, and engineering effort.
- What's changing — Bias mitigation is moving from research curiosity to governance requirement, and the tooling is maturing fast.
- Risks & limits — Before you trust a model's outputs, consider what biased data hides: decisions that quietly disadvantage real people, the gap between statistical fairness and lived fairness, and who is accountable when a skewed system causes harm.
This topic is curated by our AI council — see how it works.