Structured Logging
Also known as: JSON logging, log structuring, machine-readable logging
- Structured Logging
- Structured logging records log events as machine-parseable key-value pairs — typically JSON — rather than free-form text strings. In LLM systems, each record captures prompt text, completion, token usage, latency, and trace identifiers in a consistent schema, enabling aggregation, alerting, PII redaction gating, and cost analysis at scale.
Structured logging records each LLM request and response as machine-parseable key-value pairs — typically JSON — rather than free-form text, enabling downstream PII filtering, cost tracking, and trace correlation.
What It Is
Every time your application calls an LLM, something records the event. Text logs write a line like “Request completed in 1243ms.” That line tells you very little — you cannot filter it by model, sum it for cost analysis, or connect it to a distributed trace when you are debugging a production incident two weeks later.
Structured logging writes the same event as a consistent set of named fields: timestamp, model name, provider, latency, input tokens, output tokens, trace identifier, span identifier, retrieved context, and more. Because every field has a known name and type, you can query the records with standard tools, aggregate them into dashboards, and fire alerts when specific values cross a threshold.
Think of it this way: a text log is a note scrawled in a margin. A structured log is a row in a spreadsheet where every column has a label. You can filter and sort a spreadsheet. You cannot do that reliably with margin notes.
For LLM applications, three things depend on structured logging being consistent and schema-driven. First, cost tracking: token usage per call is a number that lives in a field — you can sum it by user, by product feature, or by department without writing a log parser. Second, PII redaction gates: according to Portkey, raw prompt and completion text carries the highest PII and PHI risk in the entire LLM stack, and the log schema must be designed so that this text passes through a redaction filter before the record is committed — not after. Third, trace correlation: when an LLM call is one step in a multi-agent workflow, the structured log record must carry a trace identifier and span identifier so it can be joined to the broader distributed trace during incident reconstruction.
As of 2026, two open specifications define which fields a structured LLM log should carry. According to OTel GenAI Repo, the OpenTelemetry GenAI semantic conventions establish the standard attribute set — currently experimental, with active development and no published release tags. According to OpenObserve, OpenInference (Apache 2.0, originated from Arize) provides a cross-ecosystem standard that most major observability platforms have adopted. According to Portkey, a conforming log record must include: trace_id, span_id, model name, provider, latency, token usage, retrieved context for RAG calls, tool arguments for agent calls, and a millisecond-precision timestamp.
How It’s Used in Practice
The most common scenario is a product team that has shipped an LLM feature and then needs to understand what is happening in production. Someone asks: “Which users are sending the longest prompts? Which calls are slowest? Where is the cost coming from?” Without structured logging, answering those questions means writing text parsers against unstructured log lines — a fragile process. Structured logging puts the answer in queryable fields from the start.
In the context of PII redaction and trace sampling — the two central concerns of production LLM logging — structured logging is where both processes run. Redaction logic operates against the prompt and completion text fields before the log record is written. Trace sampling decisions — whether to capture a call in full detail or record only summary metadata — are made against the trace_id field, which links each LLM call to the distributed trace of the workflow it belongs to.
Pro Tip: Configure your PII redaction pipeline to run before the structured log write, not as a cleanup step after. By the time you redact retrospectively, the raw text has already moved through your log buffer — and depending on your infrastructure, it may have landed in an unredacted downstream sink.
When to Use / When Not
| Scenario | Use | Avoid |
|---|---|---|
| LLM API in production with multiple users and cost visibility requirements | ✅ | |
| One-off local prompt experiments during development | ❌ | |
| PII-sensitive workflows requiring redaction gates before the log write | ✅ | |
| Quick internal prototype with no audit or compliance requirements | ❌ | |
| Multi-step agent workflows where trace correlation is needed for debugging | ✅ | |
| Single-user internal tool where raw prompt text must never leave the local machine | ❌ |
Common Misconception
Myth: Structured logging means storing the full prompt and completion text in a searchable database so you can review what the model said.
Reality: In production LLM systems, the prompt and completion text fields in a structured log are typically the most restricted data in the record. They pass through a PII redaction layer before the log is written, and in many compliance contexts they are truncated, hashed, or omitted entirely. Structured logging is primarily about the metadata fields — latency, tokens, model name, trace identifiers — not about building a searchable archive of raw LLM conversations.
One Sentence to Remember
Structured logging turns LLM calls from opaque events into queryable records — but only if PII redaction runs upstream of the log write, not downstream of it.
FAQ
Q: What fields does a structured LLM log record typically include? A: According to Portkey, core required fields include trace_id, span_id, model name, provider, latency, token usage, retrieved context, tool arguments, and a millisecond-precision timestamp. Prompt and completion text are included only after passing through PII redaction.
Q: How does structured logging differ from standard application logging? A: Standard logging writes human-readable lines you scan manually. Structured logging emits named key-value pairs — typically JSON — that you query, aggregate, and alert on programmatically without writing text parsers for each new analysis need.
Q: Do I need to build structured logging myself, or do LLM gateways handle it? A: Most LLM observability gateways emit structured logs automatically. You still control which fields are captured, which redaction policy runs first, and whether the log schema conforms to open standards like OpenInference or the OTel GenAI semantic conventions.
Sources
- OTel GenAI Repo: GitHub — open-telemetry/semantic-conventions-genai - OpenTelemetry GenAI semantic conventions for LLM observability (experimental status as of 2026)
- AgentTrace arXiv: AgentTrace: A Structured Logging Framework for Agent System Observability - First open standard for structured agent logging covering cognitive, operational, and contextual traces
Expert Takes
Structured logging is fundamentally a schema enforcement problem. Without a fixed field set, two microservices calling the same LLM write records that cannot be joined — you end up with event data that cannot be aggregated across the system. The OpenTelemetry GenAI semantic conventions address this by defining a common attribute vocabulary, though the specification remains experimental. The core constraint is simple: the log is only as useful as its least consistent field.
When you build a context-aware LLM workflow, the structured log record is the paper trail that proves the workflow ran correctly. The trace_id and span_id fields are what link a single LLM call to the orchestration layer around it — without them, you cannot reconstruct a failed agent run in production. Define your log schema before you write your first API call. Retrofitting it after the fact means touching every downstream consumer that already parses the existing format.
Every team that runs LLMs in production eventually hits a moment where someone asks “why did our AI bill spike this month?” If you don’t have structured logging in place before that question arrives, you are reading text files trying to parse costs manually. The teams that put structured logging in at deployment — not as an afterthought — are the ones that can actually answer questions about their AI spend without a two-week forensic exercise.
Structured logging creates a permanent, queryable record of what your system asked an LLM and what it received in return. That is a powerful audit capability — and also a significant data liability if the redaction layer ever fails silently. The question worth asking before you build this infrastructure is: what is the actual downstream access model for these logs? Who can query them, under what conditions, and for how long? The schema you design now determines what becomes discoverable later.