Portkey

Also known as: AI gateway, LLM gateway, LLM proxy

Portkey: Portkey is an LLM gateway platform that centralizes model routing, caching, fallback strategies, cost tracking, and request-level observability between applications and AI providers behind a single API endpoint.

Portkey is an LLM gateway that sits between your application and AI providers, managing model routing, caching, fallback strategies, and observability through a single API endpoint.

What It Is

When a product team starts integrating AI models, the setup looks manageable at first: one provider, one API key, one endpoint. Over time the picture complicates. A cheaper model gets added for high-volume tasks. A more capable model gets reserved for complex queries. Backup providers get wired in for reliability. Cost tracking stays fragmented across separate dashboards, and every change to which provider handles what requires code changes and a fresh deployment. Portkey is the coordination layer that addresses that sprawl — an AI gateway designed for LLM inference that centralizes routing, reliability, and observability behind a single endpoint your application already knows how to call.

Portkey works as a proxy — think of it as a smart switchboard sitting in front of multiple provider lines. Your application dials one number; Portkey decides which provider picks up, based on the routing config you have defined. It evaluates each request and decides what happens: send this request type to a faster model, return a cached response if the same prompt came through recently, or try a backup provider if the primary returns an error. The response comes back in the same format the application expects, regardless of which model actually answered.

Three building blocks make this work in practice. Virtual keys abstract your real provider credentials: you register each provider’s API key in Portkey once and reference Portkey’s own opaque virtual key identifiers in application code. Rotating a provider key or revoking access becomes a Portkey admin action with no changes to application code. Configs define routing and fallback rules as named, reusable policy objects — a config might say “try Model A first, fall back to Model B after two failures, cache successful responses for one hour.” Observability closes the loop: every request gets logged with the model used, latency, token count, and cost, all in a single searchable view. For teams managing multiple models across different task types, that view is what makes cost and performance differences visible and comparable.

This architecture maps directly onto model routing decisions. Instead of encoding routing logic inside application code, you express it as a Portkey config: a policy that the gateway evaluates per request. The application makes one call without knowing which provider answered; the routing decision lives entirely in the config layer.

How It’s Used in Practice

The most common entry point is a team routing requests across two or three providers by task type. A support team might send short classification tasks — “is this message urgent?” — to a faster, cheaper model, while longer drafting or summarization tasks go to a higher-quality model. Instead of encoding that logic inside the application, the team defines a Portkey config that maps request attributes to providers. Changing the routing decision later becomes a config update, not a code change and deployment.

A second frequent use is automatic failover. When a provider returns a rate-limit response or a server error, Portkey’s fallback config retries on the next provider in the chain — without any provider-specific error handling in application code. The application sees a completed response; the retry happened behind the endpoint.

Pro Tip: Set up virtual keys before building any routing logic. Once application code references Portkey virtual keys instead of raw provider credentials, every provider swap, key rotation, and routing change becomes a Portkey admin action with no application deployment required. That boundary pays back with each iteration.

When to Use / When Not

Scenario	Use	Avoid
Routing requests across multiple AI providers by task type or quality	✅
Single-provider application with no plans to switch providers		❌
Need cost and latency visibility across models and teams	✅
Ultra-low latency requirement where every proxy hop is unacceptable		❌
Evaluating which model performs best on a specific task type	✅
Environment with strict data-residency rules and no self-hosted gateway option		❌

Common Misconception

Myth: Portkey is primarily a monitoring and logging tool — something you add after the rest of the architecture is settled.

Reality: Observability is one part of what Portkey does. The gateway actively shapes requests: routing them by model, caching repeat calls, and recovering from provider failures without application-level handling. Treating it as a passive logger means skipping the routing and reliability features that motivate most adoption.

One Sentence to Remember

If you’re sending requests to more than one AI provider — or planning to — the cost of not having a gateway grows with every new integration. Portkey puts routing, fallback, and observability in one place, decoupling provider decisions from application code. Start with virtual keys; routing logic comes naturally after.

FAQ

Q: How does Portkey handle provider outages or rate limits? A: When a provider returns an error or rate-limit response, Portkey’s fallback config automatically retries on the next provider in the chain. The application receives a successful response without needing provider-specific error handling in its code.

Q: What is a virtual key in Portkey? A: A virtual key is Portkey’s abstraction over a real provider API key. Your application references the virtual key; Portkey holds the actual credential. This lets you rotate or revoke provider access without changing application code.

Q: How is Portkey different from Helicone? A: Both are LLM gateway and observability tools, but with different emphasis. Portkey centers on routing, fallback, and multi-provider management. Helicone focuses primarily on observability and logging. Some teams use both as complementary layers.

Expert Takes

MONA

A gateway like Portkey externalizes routing policy from application code. Routing becomes data — a named config you can inspect, version, and modify without touching the service that issues requests. This separation matters architecturally: the policy and the implementation no longer share a deployment lifecycle. When a routing decision needs to change, it changes in one place, and nothing needs redeploying.

MAX

The virtual key abstraction is the first thing to wire up. Raw provider keys scattered across environment files, CI secrets, and docker configs create a recurring problem: any key rotation touches multiple systems. With virtual keys, Portkey holds the actual credential and your application references only the virtual identifier. Credential rotation becomes a Portkey admin operation. Get that boundary in place before writing routing logic on top of it.

DAN

Teams shipping AI features fastest aren’t the ones with the cleverest prompts — they’re the ones whose infrastructure lets them swap models without touching application code. Portkey builds that swap point. The question isn’t whether you’ll end up using multiple providers; it’s whether you’ll wire the routing logic yourself or use something that already handles the edge cases.

ALAN

Centralizing all LLM calls through one gateway creates a natural control point. The same config that routes requests can log every prompt and response — and that log lives in Portkey’s infrastructure, not just yours. For teams handling sensitive data, the observability benefit and the data custody question are the same decision. Worth settling before the gateway becomes load-bearing.

Back to Glossary