MONA explainer 11 min read April 29, 2026

BM25, SPLADE, and Reciprocal Rank Fusion: The Building Blocks of Production Hybrid Search

Diagram of hybrid search: BM25 lexical index and dense vector index merged by reciprocal rank fusion into one ranked list

Table of Contents

ELI5

Hybrid search combines a keyword index (BM25) with a vector or sparse-neural index (SPLADE), then merges their ranked lists with a fusion step — usually Reciprocal Rank Fusion. Two retrievers see different signals; fusion picks the consensus.

A team I worked with ran a Retrieval Augmented Generation pipeline backed entirely by vector search. It found the right paragraph on kidney function — and confidently surfaced a paragraph about kidney beans when the query was about renal disease. The remedy was not a better embedding model. The remedy was admitting that two retrievers, each watching for different signals, beat one retriever trying to do both jobs.

Two Retrievers, Two Different Failure Modes

Hybrid Search is not a single algorithm. It is a deliberate pairing of two retrievers chosen because they fail in opposite directions. One reads the surface form of language; the other reads the geometry.

What are the main components of a hybrid search system?

A production hybrid search system has three moving parts. A lexical retriever, almost always BM25 over an Inverted Index. A semantic retriever, either a dense embedding model paired with a Vector Database or a learned sparse model like SPLADE. And a fusion layer that merges the two ranked lists into a single ordered output.

BM25, formalized in Robertson and Zaragoza’s probabilistic relevance framework (City University paper), scores a document by how many query terms it contains, weighted by inverse document frequency and dampened by length. The default parameters most engines ship with — k1 between 1.2 and 2.0 and b near 0.75, per the Stanford IR Book — are not theoretical optima. They are values that work well empirically across diverse English corpora.

The semantic side is where the architecture branches. Dense retrieval embeds query and document into the same vector space and ranks by cosine similarity — fast at runtime, opaque about why any given document ranked high. SPLADE, introduced by Formal, Piwowarski, and Clinchant in 2021 (arXiv 2107.05720), takes a different bet: it learns sparse term-weight vectors over BERT’s vocabulary, expanding the query with related tokens the original wording never used. The output is still sparse, still indexable in an inverted index, but the weights are neural.

The two retrievers fail in opposite directions. BM25 misses paraphrases. Dense retrieval misses exact identifiers — model numbers, error codes, named entities the embedding never learned to discriminate. Hybrid search exists because that asymmetry is real and predictable.

What do I need to understand before learning hybrid search?

You can read this section without any of the prerequisites and still follow the mechanism. But the architecture clicks when you can name three things.

First, an inverted index: a map from term to the list of documents containing that term. BM25 uses it because lexical retrieval is a set-intersection problem, and inverted indexes resolve set intersections in microseconds.

Second, dense embeddings: fixed-length vectors produced by a transformer where similar meanings sit close in space. They power vector retrieval in any modern vector database. The trade-off is geometric, not interpretable. You cannot eyeball a dense vector and say “this captured the word ‘renal’ but lost ‘beans’.”

Third, ranking. Both BM25 and dense retrieval return ranked lists, not binary verdicts. That is the only reason fusion is even possible — fusion is a combinatorial operation on rankings, not on raw scores.

How the Fusion Layer Decides Who Wins

Two ranked lists arrive at the fusion step. They disagree. Some documents rank well in one list and are absent from the other. The fusion algorithm reconciles that disagreement without seeing the raw scores — or, in some variants, while explicitly using them. The choice of fusion method is where most production decisions actually live.

What fusion algorithms are used in hybrid search and how do they differ?

Three families dominate.

Reciprocal Rank Fusion (RRF), introduced by Cormack, Clarke, and Büttcher in 2009 (University of Waterloo PDF), ignores raw scores entirely. It assigns each document a fused score equal to Σ 1 / (k + rank_i(d)), where rank_i(d) is the document’s position in the i-th retriever’s list and k is a smoothing constant — defaulting to 60 across Elasticsearch, Azure AI Search, and most other implementations (Elasticsearch RRF reference). The constant exists to dampen the influence of the very top ranks; without it, rank-1 in either list would always dominate.

Score-based fusion takes the opposite stance. Weaviate’s relativeScoreFusion — the default since v1.24 (Weaviate Docs) — normalizes BM25 and vector scores into a comparable range and combines them with a weighted sum. Qdrant’s DBSF (Distribution-Based Score Fusion, available alongside RRF) goes further: it fits the score distributions and combines them on a probability scale.

Weighted RRF is the recent compromise. It keeps the rank-only logic but lets you bias the contribution of each retriever — useful when one retriever is empirically stronger on your domain. Elasticsearch added it in 9.2 (Elastic Docs); Qdrant added it in v1.17 in early 2026.

Not synonyms. Counterweights.

RRF is robust because it ignores score scale entirely. Score-based fusion is sharper when both retrievers produce calibrated scores. Weighted variants are tuning knobs for when one retriever clearly outperforms the other. The Weaviate API exposes “rankedFusion” (its earlier RRF-style default) and “relativeScoreFusion” but does not call either one “RRF” in its documentation — a small naming detail that matters when migrating between engines.

Do I need a vector database to implement hybrid search or can I use Elasticsearch?

You do not need a dedicated vector database. As of 2026, Elasticsearch 9.1+ exposes RRF as a generally available retriever (Elastic Docs), and its HNSW index handles dense vectors directly alongside native BM25. OpenSearch ships a hybrid query and an RRF processor. Azure AI Search uses RRF with k=60 as the default for hybrid queries.

The choice between an existing search engine and a vector-native database like Qdrant or Weaviate comes down to operational fit, not capability. Elasticsearch and OpenSearch already run in many enterprise stacks; bolting on dense retrieval avoids a new system. Vector-native engines started from the dense side and have moved toward lexical: Qdrant added native BM25 in v1.15.2 and now supports BM25, SPLADE++, and miniCOIL as sparse vectors in a single Query API (Qdrant Article).

Postgres extensions — pgvector, ParadeDB, Tiger Data — are bringing the same primitives into relational stores, useful when retrieval queries already need to join against transactional data.

The architectural question is rarely “which engine has hybrid search.” It is “which engine handles the workloads around retrieval, and how much operational surface does adding a second one add.”

Hybrid search components flowing into a fusion layer that produces a single ranked list — The three moving parts of a hybrid search system: lexical retriever, semantic retriever, and the fusion layer that reconciles their disagreements.

What the Architecture Predicts (and Where It Quietly Breaks)

The mechanism turns into useful intuition once you treat each retriever as a hypothesis about your queries. Each prediction below is testable on your own evaluation set in an afternoon.

If your queries contain specific identifiers — SKUs, error codes, person names — expect BM25 to do most of the work. A small relative contribution from the vector retriever is healthy, not a failure.
If your queries are paraphrastic or cross-domain, expect the dense retriever to pull ahead. BM25 still keeps it honest by catching rare exact terms.
If your fused list looks worse than either retriever alone, the fusion algorithm is rarely the cause. One of the retrievers is producing many low-quality near-misses that fusion is upweighting.

Rule of thumb: Start with RRF and k=60. Move to weighted RRF once evaluation data shows one retriever is consistently stronger on your domain. Move to score-based fusion only if your retrievers produce well-calibrated, comparable scores — they usually do not.

When it breaks: Hybrid search degrades when both retrievers fail on the same query. RRF cannot recover signal that neither list contains. The fix is upstream — chunking strategy, query rewriting, or replacing one retriever entirely — not a different fusion method.

Security & compatibility notes:
SPLADE license (BREAKING): The naver/splade weights are CC-BY-NC-SA 4.0 — non-commercial only (naver/splade GitHub). For production, use derivative open-licensed sparse encoders (e.g., naver/efficient-splade-VI-BT-large-doc) or an engine-provided sparse model (Qdrant miniCOIL).
SPLADE maintenance (WARNING): The naver/splade repo’s last release was October 2023. Treat the official repo as research-grade, not as an actively maintained production library.
Qdrant fastembed BM25 (WARNING): Deprecated in favor of core BM25 since v1.15.2. Old client paths are slated for removal in v1.18.x.
Qdrant storage upgrade (WARNING): RocksDB was removed in v1.17 in favor of Gridstore — direct upgrade from v1.15 to v1.17 is not supported. Stage through an intermediate version.

Why Sparse Neural Models Quietly Erased the Old Boundary

The interesting move in the last few years was not a better fusion algorithm. It was the realization that sparse and dense retrieval are not opposites. SPLADE produces a sparse vector that lives in an inverted index — the same data structure BM25 uses — but with weights learned by a neural network. The retrieval engine cannot tell the difference at the index level.

That collapses the architectural distinction. A hybrid search system with BM25 plus SPLADE is technically two inverted-index retrievers, not “lexical plus neural.” Qdrant’s BM42, introduced as a BM25 alternative tuned for short RAG-style queries (SD Times), sits in the same neighborhood: rank-friendly, sparse, and engine-native. Independent benchmarks for BM42 are mixed; treat it as a Qdrant-specific option, not an industry default.

The implication for Agentic RAG systems is structural. As agents start issuing many short, intent-shaped queries, the retrieval mix shifts toward sparse-neural models that handle short queries gracefully — and away from dense vectors originally tuned on longer paragraphs. The fusion algorithm stays the same. The retrievers feeding it quietly change underneath.

The Data Says

The decision that matters in production hybrid search is rarely BM25 versus dense vectors — both have measurable contributions on most corpora. The decisions that matter are the fusion layer, the licensing of any neural sparse model you adopt, and whether your existing engine can host both retrievers without doubling your operational surface area. Pick the engine that fits the workloads around retrieval; the algorithms come along for the ride.

Sources

City University paper: The Probabilistic Relevance Framework: BM25 and Beyond - Robertson & Zaragoza’s formalization of BM25 and the probabilistic relevance framework
Stanford IR Book: Okapi BM25: a non-binary model - Default BM25 parameter ranges (k1, b) and the reasoning behind them
arXiv: SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking - Original SPLADE paper, SIGIR 2021
naver/splade GitHub: naver/splade - Official SPLADE implementation, CC-BY-NC-SA 4.0 license, maintenance status
University of Waterloo PDF: Reciprocal Rank Fusion outperforms Condorcet and individual rank learning methods - Cormack, Clarke, Büttcher’s original RRF paper
Elasticsearch RRF reference: Reciprocal Rank Fusion API reference - RRF formula and the k=60 default
Elastic Docs: RRF retriever (Elasticsearch) - RRF retriever GA in 9.1, weighted RRF in 9.2
Weaviate Docs: Hybrid search concepts - rankedFusion vs. relativeScoreFusion, default change in v1.24
Qdrant Article: Hybrid Search Revamped — Building with Qdrant’s Query API - Native BM25, SPLADE++, miniCOIL, Query API design
Qdrant GitHub releases: qdrant/qdrant releases - Native BM25 in v1.15.2, RocksDB removal in v1.17, weighted RRF in v1.17
SD Times: Qdrant introduces alternative to BM25 search tailored to improving RAG retrieval - BM42 framing as a Qdrant-specific option

Aha Moments

MAX

Mona names the failure modes. I’ll name the spec gaps that produce them. The mistake teams keep making is treating fusion as a configuration choice — pick RRF, push it live, see what happens. Fusion is downstream of the retrievers, and the retrievers are downstream of how documents got chunked and how queries get normalized. If you do not write down what the retriever is supposed to optimize for — exact-match recall, paraphrase recall, latency budget — you have no way to evaluate which fusion method fits. The contract for a hybrid search system has to start at the chunking layer and end at the eval set, not in the middle. Without that, switching from RRF to weighted RRF is theater.

DAN

Max wants the contract written before the experiment. The market is doing the opposite — engines compete on how many fusion options they ship out of the box, and buyers are confusing optionality with capability. Three things are converging right now: dense vector engines are adding lexical, lexical engines are adding dense, and Postgres extensions are bringing both into relational stores. The strategic question for any team building retrieval has shifted. It is no longer “which engine has hybrid search.” It is whether the retrieval workload deserves a dedicated system at all, or whether the engine the platform team already runs can do the job. The teams that get this right buy less infrastructure and move faster. The ones that pick a vector database first will spend the next year explaining why.

ALAN

Mona explains the mechanism, Max wants the contract, Dan tracks the market. Each frame skips a question retrieval systems quietly carry. A hybrid search system retrieves what it has been told is relevant. It does not retrieve what is missing — the document never indexed, the perspective the corpus excluded, the language the embedding model never learned to discriminate. Fusion combines two views of the same corpus. It does nothing about the corpus itself. When a RAG pipeline is the front end of a medical assistant or a legal research tool, the fusion algorithm is not the consequential decision. The consequential decision is whose documents got into the index. Who decides what gets retrieved when the answer the reader needs is the one nobody indexed?

AI-assisted content, human-reviewed. Images AI-generated. Editorial Standards · Our Editors