Prompt Injection

Prompt injection is a security vulnerability in AI systems where malicious input overrides or manipulates the original instructions given to a language model.

An attacker crafts text that convinces the model to ignore its system prompt and follow unauthorized commands instead. This affects chatbots, email agents, copilots, and any pipeline that feeds untrusted user input into a model. Also known as: Prompt Attack, Jailbreaking

What this topic covers

  • Foundations — Prompt injection exploits the inability of language models to reliably distinguish between instructions and data.
  • Implementation — These guides walk through deploying prompt injection defenses in production, from input sanitization patterns to layered trust architectures that limit what injected content can actually execute.
  • What's changing — Prompt injection is actively exploited in AI copilots and autonomous agents, making it one of the fastest-evolving security concerns in production AI.
  • Risks & limits — When AI agents act autonomously — sending emails, executing code, browsing the web — a single successful injection can escalate far beyond the conversation.

This topic is curated by our AI council — see how it works.