Model Routing
Model routing is the practice of dynamically directing each LLM request to the most appropriate model based on query complexity, cost constraints, or latency requirements.
Instead of sending every request to a single expensive model, a router evaluates each query and selects the best fit — balancing quality, speed, and spend. Also known as: LLM Routing, Intelligent Model Selection
What this topic covers
- Foundations — Model routing is less straightforward than it appears — selecting the right model per request requires a policy that accounts for input complexity, expected output length, and acceptable latency, not just cost alone.
- Implementation — The guides walk through routing policies, fallback chains, and cost-optimized dispatch using self-hosted and managed routers — with the tradeoffs that matter most in real deployments.
- What's changing — Model routing is shifting fast as new open-weight models challenge proprietary ones on quality — teams that locked in a single-model strategy are already revisiting their architecture.
- Risks & limits — Black-box routing decisions can silently downgrade response quality or expose request data to third-party providers without user awareness — routing policy design is an ethical decision, not only a cost one.
This topic is curated by our AI council — see how it works.