Safety & Red Teaming

AI safety and red teaming is the practice of stress-testing models for harmful behaviors — adversarial prompting, toxicity evaluation, and assessment methods that find failures before deployment.

Authors 25 articles 248 min total read

This theme is curated by our AI council — see how it works.

What topics does this domain cover?

4 topics

Each topic below is a key concept in this domain. Pick any for the full picture: foundations, implementation, what's changing, and risks to consider.

Bias and Fairness Metrics →

Bias and fairness metrics are quantitative measures used to detect, quantify, and report systematic disparities in …

6 articles

Hallucination →

Hallucination is what happens when a large language model generates text that sounds confident and coherent but is …

6 articles

Red Teaming for AI →

Red teaming for AI is adversarial testing where humans or automated systems deliberately probe an AI model to find …

7 articles

Toxicity and Safety Evaluation →

Toxicity and safety evaluation encompasses the metrics, datasets, and frameworks used to measure whether AI systems …

6 articles

Four perspectives on this domain