Circuit Breaker

Also known as: circuit breaker pattern, CB pattern, failure isolation switch

Circuit Breaker
A circuit breaker monitors failure rates to a downstream service and automatically stops routing requests when failures exceed a threshold, preventing cascade failures. In LLM systems, it handles prolonged outages, rate limit storms, and quality degradation — not just binary up/down availability.

A circuit breaker watches failure rates to a downstream service and, once a threshold trips, blocks all further requests — letting the provider recover without taking down your application.

What It Is

Every call you make to an LLM API is a bet: bet the provider responds in time, bet the response is usable, bet the next 100 requests go equally well. Most of the time the bet pays off. But providers hit rate limits, experience degraded performance during high-traffic periods, or go partially down — returning responses slowly enough to back up your request queue while retries pile on top. A circuit breaker is the pattern that stops you from losing that bet catastrophically.

Think of a household fuse. When current spikes, the fuse trips and cuts the circuit before anything overheats. The circuit breaker pattern works the same way: it monitors the failure rate to a downstream service and, when that rate crosses a threshold, it trips — stopping requests from going through so the provider can recover instead of getting hammered.

The pattern uses three states. According to Markaicode, they work as follows:

  • Closed — the normal state. Traffic flows. The breaker tracks failures in a rolling window; successful calls clear the count, failed calls increment it.
  • Open — the tripped state. Failures crossed the threshold. All new requests fast-fail immediately with a 503, with no attempt to reach the provider. A cooldown timer starts.
  • Half-Open — the probe state. The cooldown expired. The breaker allows a graduated test: first 1 request, then 3, then 10, according to Markaicode. If probe traffic succeeds, the breaker closes. If it fails, it reopens.

LLM providers fail differently than traditional web services. A database going down is binary — it either responds or it doesn’t. LLM providers fail in degrees: slow responses that aren’t timeouts, outputs that are technically 200 OK but garbled, rate limit warnings mixed in with real responses. According to n1n.ai, LLM circuit breakers must handle partial failures and quality degradation, not just binary service outages. This means the failure signal is not just “did the request return a 4xx?” — it can also be “did the response take more than 10 seconds?” or “did the model return an empty completion?”

How It’s Used in Practice

The most common place to encounter circuit breakers in LLM development is inside an LLM gateway — middleware like Portkey or LiteLLM that sits between your application and multiple providers. When a primary provider starts returning errors above the configured threshold, the circuit opens and the gateway routes traffic to a fallback provider automatically. Your application calls one gateway URL and the reliability logic happens transparently.

In a typical setup: you configure a failure threshold (for example, 40% error rate over a 30-second window) and a cooldown period (for example, 60 seconds before the half-open probe begins). The gateway tracks each provider independently, so one provider can be in the Open state while another remains Closed — and requests keep flowing through the working path.

Pro Tip: Set your failure threshold lower than you expect. A 40% error rate already means nearly half your requests are failing before the breaker trips. Starting at 20–25% catches degradation earlier and limits blast radius. Tune upward only if you see false trips during normal traffic variance.

When to Use / When Not

ScenarioUseAvoid
Primary LLM provider hits sustained rate limits or outages
Single intermittent timeout (one-off transient failure)
Provider returns degraded responses for a sustained period
You have only one provider configured with no fallback
High-volume production app routing through an LLM gateway
Low-volume batch jobs where a simple retry loop is sufficient

Common Misconception

Myth: A circuit breaker is just smarter retry logic.

Reality: They solve different problems. Retries handle transient failures — a single request that failed for a one-off reason. Circuit breakers handle sustained failures — when a provider is consistently failing, retries make things worse by hammering an endpoint that’s already struggling. According to Markaicode, the two patterns are complementary: retries for temporary blips, circuit breakers for prolonged outages. Using retries alone during an outage turns a provider problem into a system problem.

One Sentence to Remember

When a circuit breaker opens, it buys the failing provider time to recover — and protects your application from backing up while waiting for requests that have no chance of succeeding.

FAQ

Q: How is a circuit breaker different from a timeout? A: A timeout waits for a response and gives up on one request. A circuit breaker stops sending requests entirely once failures accumulate — each call fails immediately, without waiting for the provider to respond at all.

Q: Do I need to build circuit breaker logic myself? A: Not usually. Most LLM gateways — Portkey, LiteLLM, and similar tools — include circuit breaker logic you configure through settings rather than code. Building from scratch makes sense only when you need custom failure signals beyond standard HTTP errors.

Q: What happens to a request when the circuit is open? A: It fast-fails with a 503 immediately, without waiting for the provider. This keeps the system responsive and allows fallback logic — routing to another provider, returning a cached response, or surfacing a graceful error — to activate without delay.

Sources

Expert Takes

The circuit breaker pattern maps cleanly onto statistical process control. The closed state maintains a rolling window of failure observations; crossing the threshold is a control signal, not a guess. What makes LLM-specific implementations harder is that the failure variable is multi-dimensional — latency, HTTP status, and response quality are all valid failure signals, and weighting them requires design choices the original electrical metaphor never anticipated.

In a context-driven workflow, an open circuit breaker is a recoverable state, not a failure. The spec for your LLM gateway should define which providers are valid fallbacks, which failure signals count, and what the cooldown window is. Without that spec, retry logic has no way to route around a provider that’s failing slowly rather than failing hard. Circuit breakers close that gap in a way retries alone cannot.

Teams shipping LLM features learn this lesson under pressure: latency spikes from a provider don’t show up in your monitoring as errors until users have already bounced. Circuit breakers make that failure mode visible and recoverable before it becomes a support ticket. Every production deployment routing real traffic through an LLM API needs one. If you’re still relying purely on timeouts and retries, you haven’t hit traffic yet.

The half-open state is the part worth sitting with. A circuit breaker that probes recovery with graduated traffic makes an implicit promise to the upstream provider: test gently before trusting again. That’s reasonable design. But at scale — if many circuit breakers open toward the same provider simultaneously — probe traffic concentrates on an endpoint already under stress. Who bears the cost of that gradual reentry? The provider did not agree to graduated demand.