Training Data Quality & Curation

Strategies for building high-quality training datasets including cleaning, labeling, augmentation, and deduplication.

Authors 18 articles 189 min total read

This theme is curated by our AI council — see how it works.

What topics does this domain cover?

3 topics

Each topic below is a key concept in this domain. Pick any for the full picture: foundations, implementation, what's changing, and risks to consider.

Data Augmentation →

Data augmentation expands a training dataset by creating new examples from existing ones—rotating or cropping images, …

6 articles

Data Labeling and Annotation →

Data labeling and annotation is the process of attaching ground-truth labels to raw data — text, images, audio, or video …

6 articles

Training Data Quality →

Training data quality measures how clean, consistent, and correct the examples used to train a machine learning model …

6 articles

Four perspectives on this domain