AI Constitution
Also known as: model constitution, constitutional principles document, CAI constitution
- AI Constitution
- A written document of natural-language principles used to guide a language model’s self-critique and revision during training. The model evaluates its own outputs against these principles, revising responses to align with them — a method introduced by Anthropic in 2022 as Constitutional AI.
An AI constitution is a document of natural-language principles that a language model uses to critique its own outputs during training, shaping how it responds to instructions and sensitive situations.
What It Is
When Anthropic built Claude, they needed a way to teach the model what “harmless” and “helpful” actually mean — not through labeled examples alone, but through a process where the model could reason about its own outputs. The AI constitution is the document that makes that reasoning possible.
Think of it like a code of conduct that the model applies to itself. During training, the model generates a response, then reads its own output through the lens of the constitution — asking questions like “does this response respect human dignity?” or “is this advice something that could harm a vulnerable person?” If the answer points toward a problem, the model revises. This cycle of self-critique and revision is what Anthropic calls Constitutional AI (CAI).
According to Anthropic Research (Bai et al., December 2022), the methodology was introduced in “Constitutional AI: Harmlessness from AI Feedback.” The original paper drew on documents like the UN Declaration of Human Rights — the goal was to make values explicit and auditable, not buried in training data labels.
According to Anthropic’s constitution page, Claude’s current constitution prioritizes four properties in order: being broadly safe first, broadly ethical second, compliant with Anthropic’s guidelines third, and genuinely helpful fourth. This ordering matters: when a situation creates tension between helpfulness and safety, the model has a ranked framework for resolving it rather than guessing.
According to Anthropic Blog, the constitution underwent a significant update in early 2026. The earlier version specified rules — things the model should or should not do. The revised version shifted toward reason-based alignment: instead of prescribing behaviors, it explains the logic behind each principle. This gives the model a better foundation for handling novel situations that no specific rule anticipated.
The full document is publicly available at anthropic.com/constitution and released under a CC0 1.0 license, meaning anyone can read, adapt, or build on it without restriction.
How It’s Used in Practice
The AI constitution is primarily a training artifact, not a runtime tool. Developers and researchers encounter it most often in two contexts.
The first is understanding Claude’s behavior. When Claude declines to help with something or adds caveats to a sensitive response, the reason often traces back to principles in the constitution. Reading it gives you a map of the decision logic — more reliable than guessing or testing edge cases blindly.
The second context is building your own Constitutional AI system. Anthropic’s CAI methodology is documented in the research paper, and because the constitution itself is CC0, teams building specialized models or agents can adapt it as a starting point. According to ACM DL, researchers at the ACM Web Conference 2025 proposed the C3AI framework — a structured approach for selecting and evaluating principles for custom constitutions, which turns out to be harder than it looks in practice.
Pro Tip: If you’re applying constitutional AI prompting at inference time — having the model critique its own output based on a set of rules inside the prompt — you’re working in a fundamentally different mode than the training-time constitution. The term “AI constitution” technically refers to the training document. What you’re doing at runtime is closer to a self-refine or prompt-chaining pattern. The distinction matters when you hit the technical limits of self-critique loops: a runtime prompt cannot rewrite model weights.
When to Use / When Not
| Scenario | Use | Avoid |
|---|---|---|
| You want to understand why Claude responds a certain way to sensitive requests | ✅ | |
| You’re building a custom CAI training pipeline for a specialized model | ✅ | |
| You want to audit what values a model was trained on | ✅ | |
| You need to align a fine-tuned model with your organization’s policies | ✅ | |
| You need real-time output control in a deployed application | ❌ | |
| You’re trying to override Claude’s trained behavior through prompting | ❌ |
Common Misconception
Myth: An AI constitution is a set of rules you can pass to a model in a system prompt to control its behavior at runtime.
Reality: The AI constitution is a training document — it shapes model weights during the training process, not responses at inference. Passing a list of principles in your system prompt is a separate technique (prompt-based self-critique, or self-refine). The two approaches can produce similar-looking outputs, but only the training-time constitution permanently influences how the model reasons. A prompt-based version depends on how consistently the model follows instructions within that specific conversation.
One Sentence to Remember
An AI constitution is a training-time document, not a runtime control — read it to understand what a model was taught to value, and adapt it if you’re training a model of your own.
FAQ
Q: Is Anthropic’s AI constitution available to read? A: Yes. According to Anthropic’s constitution page, the full document is at anthropic.com/constitution and released under CC0 1.0 — free to read, use, or adapt without attribution required.
Q: What is the difference between an AI constitution and a system prompt? A: A constitution shapes model weights during training; a system prompt influences one conversation at inference time. The constitution’s effects are permanent; a system prompt’s effects last only as long as the session.
Q: Can I create a constitution for my own AI model? A: Yes. The CAI methodology is documented in the Anthropic Research paper (Bai et al., 2022) and Anthropic’s CC0 constitution is freely adaptable. The C3AI framework from ACM DL provides a structured approach for selecting and evaluating principles.
Sources
- Anthropic Research: Constitutional AI: Harmlessness from AI Feedback (Bai et al., 2022) - Original paper introducing the CAI methodology and constitution concept
- Anthropic’s constitution page: Claude’s Constitution — Anthropic - Live public version of Anthropic’s AI constitution (CC0, updated 2026)
Expert Takes
The AI constitution is a formalization of a principle borrowed from science: if you want consistent behavior under novel conditions, codify the reasoning behind your rules, not just the rules themselves. The most recent update shifted from rule prescription to reason explanation — the more defensible design. A model trained on why something is harmful generates better refusals than one trained on a list of forbidden outputs, because the list will always be incomplete.
In a context-driven architecture, the AI constitution is the deepest layer of the spec stack — it lives below the system prompt, below any operator instruction. That ordering has a practical implication: when you write system prompts for Claude, you’re not overriding the constitution, you’re operating within it. Design your instructions to work with the priority ordering (safe → ethical → compliant → helpful), not against it, and you’ll hit far fewer unexpected refusals.
The CC0 license on Anthropic’s constitution is a calculated move. It signals transparency — but it also seeds the methodology across the industry. Every team that adapts Claude’s constitution to train their own model carries Anthropic’s value hierarchy into a new product. That’s not philanthropy. That’s a distribution strategy for a set of principles that Anthropic wrote.
Publishing a constitution under CC0 invites us to ask whose values are encoded in it — and who wasn’t in the room when it was written. The priority order (safety first, helpfulness fourth) reflects specific institutional choices. When that document becomes the template for other organizations’ training pipelines, those choices propagate without the original authorship being visible. The transparency is real. The accountability gap it exposes is equally real.