Qdrant

Qdrant
An open-source vector database written in Rust that runs hybrid search natively, combining dense embeddings, sparse models like BM25 and SPLADE, and fusion methods such as Reciprocal Rank Fusion inside a single Query API call.

Qdrant is an open-source vector database written in Rust that natively supports hybrid search, combining dense embeddings, sparse retrieval models, and server-side fusion in a single Query API call.

What It Is

Production RAG systems need fast, relevant retrieval at scale. Pure dense vector search misses keyword precision — exact product codes, function names, or rare terms slip through. Pure keyword search misses semantic intent — paraphrased queries return nothing. Qdrant exists because teams building RAG pipelines kept stitching together separate engines for these two retrieval modes, then writing glue code to merge results. Qdrant collapses that into one engine: dense vectors, sparse vectors, and result fusion all run inside the same database, behind one query call.

Architecturally, Qdrant is a vector database first — collections store points with dense embeddings, payload metadata, and optional sparse vectors. According to Qdrant Article, the unified Query API introduced in v1.10 lets a single request describe a multistage retrieval pipeline: run dense search, run sparse search such as BM25 or SPLADE++, then fuse the ranked lists server-side. According to Qdrant GitHub releases, native BM25 inference shipped in v1.15.2 — earlier versions required an external embedder to compute sparse vectors, which added a moving part to every deployment.

Fusion happens inside Qdrant, not in your application code. According to Qdrant GitHub releases, supported methods include Reciprocal Rank Fusion (the default), Distribution-Based Score Fusion, and a Weighted RRF variant added in v1.17. The same release line replaced the older RocksDB storage engine with Gridstore, which is why direct upgrades from v1.15 straight to v1.17 are not supported. Qdrant Cloud wraps the open-source engine in a managed service with a permanently free single-node tier and usage-based paid plans for dedicated production clusters.

How It’s Used in Practice

The mainstream use case for Qdrant is the retrieval layer of a production RAG application. A team building a chatbot over an internal knowledge base writes documents into a Qdrant collection, attaches both a dense embedding (from a model like OpenAI’s or a local sentence-transformer) and a sparse BM25 vector, and then queries with both at once. The Query API runs both retrievals, applies Reciprocal Rank Fusion server-side, and returns one ranked list — no glue code required.

A common stack: a Python or TypeScript backend talks to Qdrant via the official client. The application generates the dense embedding, sends the user query and a prefetch block describing the sparse retrieval, and Qdrant handles the rest. For agentic workflows, the same collection backs multiple retrieval strategies — strict keyword for code identifiers, hybrid for natural-language questions, pure semantic for “find similar” lookups — without separate indexes.

Pro Tip: If you are wiring Qdrant into a hybrid pipeline for the first time, start with prebuilt RRF fusion before reaching for Distribution-Based Score Fusion or the new Weighted RRF. Tuning custom weights without baseline metrics in place usually makes recall worse, not better. Get evaluation queries and ranking metrics shipping first, then experiment with fusion variants once you can measure the difference.

When to Use / When Not

ScenarioUseAvoid
Production RAG pipeline that needs hybrid search out of the box
Self-hosted vector store with strict data residency requirements
Tiny prototype with under ten thousand documents and no search complexity
Team without operational capacity to run any database service
Application needs server-side fusion of dense and sparse results
Pure SQL-style filtering with no vector or semantic component

Common Misconception

Myth: Qdrant requires an external embedding service like fastembed to compute BM25 sparse vectors. Reality: That used to be true. According to Qdrant GitHub releases, native BM25 inference moved into Qdrant core in v1.15.2 (2025), so the database now generates sparse term vectors itself — no separate embedder process to deploy, monitor, or scale alongside the database.

One Sentence to Remember

Qdrant is the engine that lets you ship production hybrid search — dense, sparse, and fused — from one collection and one query call, instead of duct-taping two retrieval systems together with custom merge code.

FAQ

Q: Is Qdrant free to use? A: Yes. The open-source core is Apache-2.0 licensed and runs anywhere. Qdrant Cloud also offers a permanently free single-node tier for prototypes; production clusters move to usage-based paid plans on dedicated hardware.

Q: How does Qdrant compare to Pinecone or Weaviate? A: Qdrant is open-source and Rust-native, with server-side hybrid fusion built into one Query API. Pinecone is closed-source SaaS only; Weaviate is open-source but has a different module-based architecture. Pick based on hosting needs and ecosystem fit.

Q: Does Qdrant support BM25 natively now? A: Yes. According to Qdrant GitHub releases, native BM25 inference shipped in v1.15.2 in 2025. Earlier versions required an external embedder. Qdrant also developed BM42, a BM25 variant tuned for short RAG queries.

Sources

Expert Takes

Qdrant’s design captures a deeper truth about retrieval: dense and sparse vectors measure different things. Dense embeddings encode semantic similarity; sparse vectors encode lexical overlap. Neither is sufficient on its own for natural-language questions over technical content. By fusing both inside a single index, Qdrant operationalizes the principle that retrieval quality is a multi-dimensional signal — not a contest between approaches but a calibration of complementary ones. Architecture, not magic.

The interesting move is the Query API. Instead of forcing the application to orchestrate dense retrieval, sparse retrieval, and fusion in separate calls, Qdrant exposes the whole pipeline declaratively. You describe the retrieval graph; the database executes it. That is a context boundary done right — the database owns ranking logic, the application owns business logic. Less glue code, fewer integration bugs, easier evaluation harnesses to wire up.

Hybrid search is becoming table stakes for production RAG, and Qdrant positioned itself as the open-source default. Rust-native performance, a permissive open-source license, and a managed cloud option together cover the procurement objections that closed-source SaaS struggles with. Teams that started on closed-source vector services a few years back are reevaluating now — and Qdrant’s hybrid-first story is exactly the migration narrative those teams need.

A vector database is also a memory system, and memory systems shape what AI applications can recall about people. Qdrant’s open-source license is genuinely meaningful here — teams can audit storage, control where embeddings live, and delete records on request. That matters in regulated domains. The harder question is governance discipline: open-source does not enforce data minimization. Easy infrastructure makes it tempting to embed everything, including data nobody asked you to keep.