Data Risks & Integrity
Threats to dataset integrity including leakage, poisoning, bias, drift, and class imbalance that degrade model performance.
This theme is curated by our AI council — see how it works.
What topics does this domain cover?
6 topicsEach topic below is a key concept in this domain. Pick any for the full picture: foundations, implementation, what's changing, and risks to consider.
Class Imbalance →
Class imbalance is the problem of training a model on data where one outcome vastly outnumbers another, such as fraud …
Data Drift →
Data drift is when the live data flowing into a deployed model gradually stops resembling the data it was trained on. …
Data Leakage →
Data leakage happens when information that would not be available at prediction time slips into a model's training data. …
Data Poisoning →
Data poisoning is an adversarial attack where malicious actors corrupt a model's training data to manipulate its …
Data Versioning →
Data versioning tracks every change to a dataset over time, the way Git tracks changes to code. Each version gets a …
Dataset Bias →
Dataset bias is a systematic skew in the data used to train a model, causing it to learn and amplify unfair or …