Voyage Rerank

Voyage Rerank
Voyage Rerank is a family of cross-encoder reranking models from Voyage AI (now part of MongoDB) that re-scores retrieved passages against a query, with the rerank-2.5 generation adding instruction-following so relevance criteria can be steered at query time.

Voyage Rerank is a family of cross-encoder reranking models from Voyage AI (now part of MongoDB) that re-scores retrieved passages against a query and can follow natural-language instructions to steer relevance.

What It Is

When a search system returns 50 candidate passages for a user question, only the top 3-5 usually fit into the LLM prompt. Pick the wrong ones and the answer suffers — even if the right document was in your candidate pool. Voyage Rerank exists to make that selection step accurate, scoring each candidate against the query with far more precision than the first-stage retriever can afford. It’s the part of the pipeline that decides whether your RAG system feels sharp or vaguely off-topic, and it’s where the gap between bi-encoder retrieval and cross-encoder reranking pays off most visibly.

The mechanism is a cross-encoder. Think of it as a careful editor who reads the question and a candidate answer side by side, instead of comparing their summaries from a distance. Where a typical embedding model encodes the query and the document separately and compares the two vectors, a cross-encoder reads them together as a single input. That joint reading lets attention flow between query terms and document terms directly, producing a relevance score that reflects how the two passages actually relate. The trade-off is cost: cross-encoders are slower per pair than embedding lookups, which is why they run on a small candidate set rather than the entire corpus.

According to Voyage AI Docs, the current generation — rerank-2.5 and the smaller rerank-2.5-lite — adds an instruction field. You pass natural-language guidance like “prefer recent regulations” or “prefer official documentation” alongside the query, and the model conditions its scoring on that intent. According to Voyage AI Blog, the 2.5 generation also extended the per-document context window by roughly 8× compared to the prior generation, making it practical to rerank longer passages without truncation. Both variants are multilingual and accept up to 1,000 candidate documents per request, according to Voyage AI Docs.

How It’s Used in Practice

Most teams meet Voyage Rerank as the second stage of a retrieval-augmented generation pipeline. A vector database — Atlas Vector Search, Pinecone, Qdrant, or pgvector — returns a candidate set of, say, 50 to 100 passages for a user query. Those candidates go to Voyage Rerank along with the original query and an optional steering instruction. The reranker returns relevance scores; you keep the top 5 to 10, drop the rest, and pass only those into the prompt sent to the LLM.

The instruction field is where the 2.5 generation earns its keep. A customer-support bot can ask the reranker to “prefer answers from the most recent product version.” A legal research tool can ask for “the most authoritative primary source.” A multilingual help desk can specify “prefer answers in the user’s language.” Each of these used to require either a custom fine-tune or a downstream prompt that the LLM might ignore. Now the steering happens before the LLM ever sees the candidates.

Pro Tip: Don’t skip the instruction field once you’re past prototyping. The default behavior is fine for generic Q&A, but adding a one-line steering instruction often beats appending the same constraint to the LLM system prompt — the reranker filters before the LLM sees the noise, and the instruction costs nothing extra in output tokens.

When to Use / When Not

ScenarioUseAvoid
Multilingual RAG over mixed-language corpora
Latency-critical autocomplete under tens of milliseconds
Domain-specific relevance (recency, jurisdiction, audience)
Tiny corpora where first-stage retrieval already returns the right top-3
Search over MongoDB Atlas Vector Search results
Air-gapped or offline deployments without external API egress

Common Misconception

Myth: A reranker is just another search engine you can use instead of your vector database. Reality: A reranker doesn’t replace your retrieval layer — it re-orders what that layer returned. Voyage Rerank sees one query and a small candidate list at a time, then assigns precise relevance scores using a cross-encoder that reads query and document jointly. Without a first-stage retriever supplying candidates, there’s nothing for it to score.

One Sentence to Remember

Voyage Rerank turns a coarse first-stage retrieval into a precise, instruction-aware ranking — a small layer that often makes the largest visible difference in RAG quality.

FAQ

Q: Is Voyage Rerank a vector database? A: No. It’s a cross-encoder service that re-orders results from a vector database or hybrid search. You still need a separate retrieval layer to produce the candidates Voyage Rerank scores.

Q: What’s the difference between rerank-2.5 and rerank-2.5-lite? A: According to Voyage AI Docs, both are cross-encoders with instruction-following. The lite variant is smaller and faster, intended for latency-sensitive workloads; the full model trades latency for higher relevance.

Q: Does Voyage Rerank work with non-MongoDB vector databases? A: Yes. The API is database-agnostic. You can pair it with Pinecone, Weaviate, Qdrant, pgvector, or your own search system — Voyage Rerank only needs the query and the candidate documents.

Sources

Expert Takes

Not a better embedding. A conditional one. The cross-encoder now reads a steering instruction alongside the query, which moves reranking from a frozen scoring function to one that bends to context. The same query under “prefer recent” and “prefer authoritative” produces different orderings. The math underneath is unchanged transformer attention; what changed is what gets to enter the prompt that conditions it.

Treat the instruction field as a contract, not a gimmick. If your spec says “answers must cite the most recent guidance,” the instruction passed to the reranker should mirror that wording. When relevance criteria live in your context file alongside coding rules and data shapes, they survive prompt edits and team handoffs. Bury the steering rule inside an LLM system prompt and it drifts the moment someone refactors the message.

The reranker market just consolidated under a new dynamic: whoever owns the vector layer ships the reranker too. MongoDB acquired Voyage; the rest of the database vendors are watching. For buyers, the question is no longer “which standalone reranker scores highest” but “does my data platform already include one good enough?” The vendor with the integrated stack wins the procurement meeting before benchmarks enter the room.

Instruction-following rerankers solve a real problem and create a quieter one. The same lever that lets a developer say “prefer recent” lets a buyer say “prefer our company’s whitepapers.” Whose instructions does the user see, and whose are baked into the system before they ever type a question? When relevance becomes steerable, neutrality becomes a deliberate choice — and most of the time, no one’s deliberating.