Hybrid Search
- Hybrid Search
- A retrieval method that runs keyword search (typically BM25) and dense vector search in parallel, then fuses the ranked results — usually with Reciprocal Rank Fusion — to combine exact-term precision with semantic understanding.
Hybrid search runs keyword retrieval (BM25) and dense vector retrieval in parallel, then merges the rankings to return documents matched by both exact terms and semantic meaning.
What It Is
Most retrieval-augmented generation (RAG) systems started with pure dense vector search — embed the query, embed the documents, return whatever sits closest in vector space. That works for paraphrased questions but breaks the moment someone asks about “ISO 27001,” “GPT-4o,” or a specific customer order ID. The vector model treats those as hints, not literal targets, so it returns documents that are semantically nearby but lexically wrong. Hybrid search closes that gap: a keyword retriever runs alongside the vector retriever, so exact terms still hit while paraphrases still get found.
Under the hood, two retrievers run in parallel. The sparse side uses BM25 (Best Match 25), a keyword ranking function that scores documents by how often query terms appear and how rare those terms are across the corpus. The dense side uses an embedding model — OpenAI’s text-embedding-3 or a Cohere embed model — to turn each query and document into a vector, then ranks by similarity. Each retriever returns its own ranked list of candidates.
The two lists are then fused into one. The standard method is Reciprocal Rank Fusion (RRF), which scores each document by its position in each list rather than by raw retrieval scores — a document ranking well in both lists beats one ranking well in just one. According to Weaviate Blog, RRF is the default fusion algorithm in most managed vector databases, and several vendors expose a tunable weight so teams can dial keyword precision up for product catalogs or shift toward semantic recall for narrative content.
How It’s Used in Practice
The most common place to encounter hybrid search is inside a production RAG pipeline. A product manager evaluating a chat-with-your-docs feature, a developer building a customer support assistant, or a team rolling out an internal Q&A tool will all run into it because pure vector search misses documents that depend on specific names, codes, or jargon. Swapping in hybrid is usually the first retrieval upgrade once a demo hits real users.
The integration pattern is the same across stacks. Frameworks like LangChain and LlamaIndex expose hybrid retrievers as a one-line swap from their dense-only equivalents. According to Vertex AI Docs, managed vector databases — Vertex AI Vector Search, Weaviate, Milvus, and Pinecone — handle the BM25 index, the vector index, and the fusion step natively, so application code sees a single query() call. The team picks an embedding model, points the index at the corpus, and tunes the sparse-dense weight if early evaluations show the system leaning one way.
In production, hybrid search is usually paired with a reranker — a smaller model that re-scores the top fused candidates so the most relevant documents end up in the final context window.
Pro Tip: Don’t tune the sparse-dense weight in the abstract. Write twenty representative queries, run them through pure dense and hybrid retrieval, and see where each fails. Most corpora have either heavy keyword anchors (product IDs, error codes) or heavy semantic drift (support narratives), and the right weighting becomes obvious.
When to Use / When Not
| Scenario | Use | Avoid |
|---|---|---|
| RAG over technical documentation with specific product names, error codes, or version numbers | ✅ | |
| Pure semantic Q&A over conversational text where users only ask paraphrased questions | ❌ | |
| Customer support knowledge base mixing tickets, articles, and policies | ✅ | |
| Real-time recommendation engine where latency matters more than recall | ❌ | |
| Legal or compliance search where missing an exact citation is unacceptable | ✅ | |
| Tiny corpus of a few hundred short documents where BM25 alone already ranks well | ❌ |
Common Misconception
Myth: Hybrid search is just dense vector search with keyword search bolted on as a fallback when the vector retriever returns nothing.
Reality: Both retrievers run in parallel on every query, not as a fallback chain. Hybrid means simultaneous retrieval and rank fusion — a document appearing in both lists is promoted ahead of one appearing in only one. The fusion step is the actual mechanism, not the retrieval order.
One Sentence to Remember
If a RAG system needs to handle both “tell me about returns” and “what’s the refund policy for order #4F-9821” in the same question stream, hybrid search is the baseline retrieval stack — and a reranker is the next upgrade.
FAQ
Q: When should I switch from dense vector search to hybrid search? A: As soon as your RAG system handles queries with specific names, IDs, codes, or rare technical terms. Hybrid becomes the default once a demo meets real users.
Q: Does hybrid search replace the need for a reranker? A: No. Hybrid handles initial retrieval and recall; a reranker re-scores the top fused candidates for relevance. Production RAG stacks typically use both — hybrid first, then a reranker.
Q: What is Reciprocal Rank Fusion and why is it the default? A: RRF combines ranked lists by scoring documents on their position rather than raw scores. It avoids normalising incompatible scores from BM25 and vector similarity, which is why most vendors ship it.
Sources
- Weaviate Blog: Hybrid Search Explained - Vendor overview of BM25 + dense retrieval and Reciprocal Rank Fusion
- Vertex AI Docs: About hybrid search — Vertex AI - Google Cloud reference for native hybrid retrieval
Expert Takes
Not magic. Two retrievers running in parallel. Dense embeddings compress semantic similarity into a smooth manifold but flatten rare tokens — the exact ones that disambiguate a query. Sparse retrieval keeps that lexical sharpness but ignores synonymy. Running them together and fusing by rank position recovers the best of each without forcing a single representation to do the work of two.
Hybrid search is a context-driven retrieval problem. The diagnosis usually shows up in eval traces — exact-match queries failing point to a missing BM25 layer; paraphrase queries failing point to a weak embedding model. The fix isn’t another layer, it’s a clearer spec. Decide what mix of literal precision and semantic recall the corpus needs, write that into your retrieval config, and tune the weight against a real eval set.
Pure vector search had its moment. That moment ended. Every serious RAG vendor now ships hybrid by default, and reranking is the standard follow-up. Teams still shipping dense-only stacks today aren’t ahead — they’re behind. The retrieval layer is the shortest path to a usable RAG product, and hybrid is the line below which you’re not in the conversation. Build hybrid first or watch a competitor’s chatbot find the answer yours missed.
Hybrid search makes RAG more reliable, which sounds unambiguously good. But what does “more reliable” mean if both retrievers surface the same flawed source? Fusion amplifies its rank rather than questioning it. Every retrieval improvement also makes a corpus easier to deliver — including its biases, its outdated policy, its quietly wrong answers. Who audits the corpus the retriever just made faster to find?