BAML
Also known as: Boundary AI Markup Language, BoundaryML DSL, BAML DSL
- BAML
- BAML is an open-source domain-specific language for defining typed LLM function calls. Declare a function signature, output schema, and prompt template in a .baml file; BAML compiles these into type-safe client libraries in Python, TypeScript, and other languages that handle parsing, validation, and retries.
BAML is an open-source domain-specific language for defining typed LLM function calls — declare a schema and prompt in one file and get client code that handles parsing and retries automatically.
What It Is
When a production service calls an LLM and expects structured data back — a list of extracted entities, a categorized label, a confidence score — the output format problem is always present. JSON mode helps but doesn’t guarantee schema compliance. Manual parsing with try/catch handles the happy path but breaks on edge cases. BAML (Boundary AI Markup Language) is a domain-specific language (DSL) — a small language purpose-built for a specific task — that moves this problem from runtime error-handling to compile-time contract definition.
Think of a .baml file the way you think of a TypeScript interface: it specifies what a function returns before you write the implementation. With BAML, the “implementation” is the LLM call, and the interface is the output schema. The BAML compiler reads that .baml file and generates a typed client library in your target language. That generated client sends the prompt, receives the response, tries to extract structured fields even from free-text output, validates the result against the declared schema, and retries when the output doesn’t conform.
A single .baml file contains three things: the output schema (typed classes, enums, and nested structures), a prompt template with typed input variables, and a client configuration specifying which LLM provider and model to use. The compiler handles per-language code generation — the same spec emits clients for Python, TypeScript, Ruby, and others from one source file.
In the context of structured output challenges — schema enforcement limits, token overhead, and parsing failures — BAML addresses the application layer rather than the inference layer. It doesn’t eliminate the token cost of constrained decoding approaches like xGrammar or Outlines, which operate at the token generation level. Instead, it provides a different path: building extraction resilience into the generated client so schema compliance is enforced regardless of whether the model produces clean JSON, free-text, or a mix.
How It’s Used in Practice
The most direct use case is a production data extraction pipeline. A team needs to extract named entities, dates, and sentiment from customer support tickets at volume. Without BAML, they write a prompt, call the API, and parse the JSON response — then discover that a small percentage of responses come back with unexpected field names or missing fields, and the next several hours go into error handling.
With BAML, the team writes a single .baml file that declares a Ticket class with typed fields — customer name, sentiment enum, entity list — then writes the prompt template. The compiler generates a fully typed TypeScript or Python client function. When the LLM returns something unexpected, BAML retries before surfacing an error. The IDE also shows exactly what fields the function returns, which catches schema mismatches during development rather than in production.
BAML also works well in multi-language environments. One .baml spec generates client libraries for the Python backend and the TypeScript frontend simultaneously, keeping schema definitions in sync across both codebases.
Pro Tip: BAML’s parser handles responses even when the model mixes explanation text with the structured data. If you see models reasoning out loud before the JSON block, BAML still extracts the fields correctly — you don’t need to add instruction overhead to suppress that behavior.
When to Use / When Not
| Scenario | Use | Avoid |
|---|---|---|
| Production service making LLM calls that return structured data | ✅ | |
| Multi-language team sharing one schema definition across codebases | ✅ | |
| Models that inconsistently produce clean JSON despite JSON mode being on | ✅ | |
| Simple one-off script extracting a single value from an LLM | ❌ | |
| Stack already using constrained decoding (xGrammar, Outlines) at inference time | ❌ | |
| Schema so simple a regex or basic JSON parse handles it reliably | ❌ |
Common Misconception
Myth: BAML only works when the model returns valid JSON.
Reality: BAML’s parser is designed to extract structured data from free-text, markdown, and mixed-format responses. Schema enforcement happens inside the generated client, not by requiring the model to produce syntactically valid JSON. This makes BAML useful precisely for the cases where JSON mode is unreliable or unavailable.
One Sentence to Remember
BAML shifts structured output reliability from runtime hope to compile-time contract — write the schema once, and the generated client handles parsing, validation, and retries across whatever format the model returns.
FAQ
Q: What languages does BAML generate clients for?
A: BAML generates clients for Python, TypeScript, Ruby, and other languages from a single .baml definition file. The core spec is language-agnostic; the compiler handles per-language output without requiring changes to the schema.
Q: Does BAML work with any LLM provider?
A: BAML supports multiple providers including OpenAI, Anthropic, Google, and open-source models. The client configuration in the .baml file specifies the provider and model, making it straightforward to switch.
Q: How does BAML differ from using JSON schema validation on the API response? A: JSON schema validation checks output after parsing and fails when the model returns invalid JSON. BAML builds extraction from non-JSON responses and automatic retries into the generated client, handling failure modes that schema validation alone cannot.
Expert Takes
BAML separates three concerns that most LLM integrations conflate: schema definition, prompt construction, and response parsing. The compiler generates a typed client that applies an extraction approach — it tries to pull structured fields from the raw response string rather than requiring the model to produce syntactically valid JSON. The practical effect is a wider set of acceptable model outputs that still map to a valid typed result. That is a meaningful reliability gain for structured output use cases.
If your team has Python and TypeScript services calling the same LLM endpoint, keeping the response schema in sync across two codebases is the first thing that breaks. BAML solves this at the source: one .baml file is the schema contract, and the compiler emits typed clients for both languages at once. When the schema changes, both clients regenerate together. That prevents the “we updated Python but forgot TypeScript” class of bugs in production.
Every production team using LLMs eventually builds the same brittle JSON parser — a try/catch around json.parse() that fails in the middle of the night. BAML removes that from the backlog before you write it. The schema lives in source control, the client is generated, and the retry logic is shared across services. That is the kind of infrastructure that turns “we call an LLM” into “we have a reliable data extraction layer” — which is the business outcome that actually matters.
BAML makes an appealing promise: move the schema contract to source control and let the compiler handle the compliance gap between what you declare and what the model returns. The promise rests on a parser layer that extracts fields from whatever the model sends back. When that output drifts far enough from the declared schema, the extractor fails and the retry cost falls on you. The abstraction buys reliability at the center of the distribution. It doesn’t eliminate the tail.