Instructor
Also known as: instructor library, instructor-python, instructor-ts
- Instructor
- Instructor is a Python and TypeScript library that wraps LLM API calls with Pydantic or Zod schema validation, automatically retrying failed structured output attempts until the model returns a response that matches the declared type.
Instructor is an open-source library that adds schema validation and automatic retry logic to LLM API calls, ensuring responses conform to a declared Pydantic or Zod type before your code receives them.
What It Is
Instructor solves one of the most common friction points when building with language models: getting a response in the exact shape your code expects, reliably and without brittle parsing logic.
When you ask a language model to return JSON, it will often comply — but “often” is not “always.” The model might wrap the response in a code fence, add an explanatory sentence before the opening brace, or return a plausible-looking object with a required field missing. Any of these deviations causes a downstream failure. For one-off scripts, that’s manageable. For a production pipeline running thousands of LLM calls per day, it becomes a reliability problem that compounds with every schema change.
Instructor treats this as a first-class engineering problem. Think of it as a contract enforcer for LLM responses. You define the expected shape of the output using a Pydantic model (Python’s type validation library) or a Zod schema (TypeScript’s equivalent), and Instructor handles the rest: it serializes the schema into the format the API expects, parses the model’s response, validates it against the declared types, and — when validation fails — automatically sends the error message back to the model and asks it to correct the output. The model sees its own mistake and fixes it, typically within one or two additional calls.
Instructor also handles provider differences internally. Different LLMs expose different mechanisms for getting structured output — function calling (where the model invokes a declared function with typed arguments), tool use (a similar mechanism used by Anthropic and others), JSON mode (a response-format flag), or raw text prompting. Instructor calls these “modes” and selects the right one based on which model and API you’re calling. You can switch between providers, or test the same pipeline against multiple models, without changing your validation logic. The Pydantic model or Zod schema you defined stays the same; the adapter layer changes.
This separation matters for structured output pipelines specifically: schema enforcement failures often originate from a mismatch between what the prompt asks for and what the model’s inference path supports. Instructor doesn’t fix that mismatch at the source — it catches the symptom (a malformed response) and retries. For teams working at the API layer without access to model internals, that retry-and-validate pattern is the practical alternative to constrained decoding, which requires inference-time control.
How It’s Used in Practice
The most common scenario is a developer who needs to extract structured data from unstructured text: parsing a support ticket into categories, extracting named entities from a document, or converting a natural-language query into a typed search filter object. Without a library like Instructor, they write custom parsing code, handle edge cases manually, and re-test every time the model updates.
With Instructor, they define a Pydantic class with the expected fields and annotate it with field descriptions. That class becomes the response_model parameter on an API call. Instructor serializes it, calls the model, and returns a validated Python object — no parsing, no string manipulation. If the model returns a malformed response on the first attempt, Instructor sends the validation error back and requests a correction.
Pro Tip: Instructor’s retry behavior is where token costs can surprise you. Each failed validation triggers a full additional LLM call with the error appended to the context. For production use, always set a max_retries limit and monitor retry frequency. A high retry rate is a diagnostic signal — it usually means the schema is too strict, the prompt doesn’t give the model enough context about the expected types, or the model being called doesn’t handle that extraction task reliably.
When to Use / When Not
| Scenario | Use | Avoid |
|---|---|---|
| Extracting typed entities from free-form text | ✅ | |
| Real-time use cases where retry latency is unacceptable | ❌ | |
| Consistent typed output across multiple LLM providers | ✅ | |
| High-volume pipelines where token overhead from retries compounds costs | ❌ | |
| Prototyping structured pipelines before committing to constrained decoding | ✅ | |
| Environments requiring guaranteed first-pass schema conformance | ❌ |
Common Misconception
Myth: Instructor guarantees that the model produces correct output on the first try.
Reality: Instructor does not change what the model generates — it validates what comes back and retries if the response fails schema validation. The model can still return semantically wrong data (a plausible-looking but factually incorrect value) that passes type validation without error. Instructor catches structural failures; domain correctness remains your responsibility to verify.
One Sentence to Remember
Instructor is a retry-and-validate wrapper around LLM API calls: it offloads schema serialization and response validation so your code receives typed data instead of raw strings, but it does not change how the model reasons about what to put inside that structure.
FAQ
Q: Does Instructor work with LLMs other than OpenAI’s models? A: Yes. Instructor supports multiple providers including Anthropic, Google, and others. The library selects the appropriate mode — function calling, tool use, or JSON mode — based on what the target model supports.
Q: How does Instructor differ from constrained decoding libraries like Outlines or xGrammar? A: Constrained decoding modifies the model’s token sampling at inference time, making schema-invalid output structurally impossible to generate. Instructor works at the API layer — it validates the model’s free-form output after generation and retries on failure. Instructor is easier to plug into existing API-based setups; constrained decoding requires access to model internals.
Q: What happens when Instructor exhausts its retry limit? A: It raises a validation error after the final failed attempt. The calling code needs to handle this exception — catch it, log it, and decide whether to surface an error to the user, fall back to a default value, or escalate for review.
Expert Takes
Instructor puts schema enforcement at the wrong layer of the stack. Validation after generation is probabilistic error correction — the model already committed to a token sequence, and retrying hopes the next sequence is better. Constrained decoding enforces the schema during generation, which is structurally sounder. Instructor is the practical choice for API-only environments, but it is a workaround for a gap that better inference infrastructure would close at the source.
In specification-driven workflows, Instructor gives you typed contracts between LLM calls and downstream code without requiring self-hosted infrastructure. Define the expected shape once as a Pydantic model, and every API call in the pipeline uses the same schema — no custom JSON parsing, no fragile regex extraction, and failed validations surface as typed errors rather than silent data corruption. The max_retries parameter is your cost control lever; treat it as a required field, not an optional default.
Instructor became the practical standard for API-layer structured output because it solved the gap between what providers promised and what production pipelines actually needed. That gap is narrowing — native structured output enforcement at the API level is improving across providers. Teams already using Instructor’s validation patterns aren’t losing ground; those patterns map directly onto tighter enforcement mechanisms when the team is ready to move.
Instructor makes it easier to automate decisions that were previously manual — classifying intent, routing actions, extracting entities. The question worth asking is not whether the output passes schema validation, but whether the types you defined capture the right distinctions in the first place. A type-correct response is not a correct decision. Schema enforcement is a floor, not a ceiling.