DiskANN

Also known as: Disk-based ANN, Vamana index, DiskANN vector index

DiskANN: Microsoft’s open-source library for approximate nearest neighbor search on billion-scale datasets using a single machine with SSD storage, combining a Vamana graph index with product quantization to keep costs low while maintaining high recall.

DiskANN is an approximate nearest neighbor search library developed by Microsoft that enables billion-scale vector similarity searches on a single machine by storing full-precision vectors on SSD while keeping compressed representations in RAM.

What It Is

When you need to search through billions of vectors — say, finding the most similar product image in a catalog of a billion items — you run into a hard constraint: that many vectors don’t fit in memory. Traditional in-memory approaches — graph-based indexes like HNSW (Hierarchical Navigable Small World) or partition-based methods like IVF (Inverted File Index) — work well at smaller scales, but at the billion-point mark, the hardware costs become steep. DiskANN solves this by splitting the work between RAM and disk in a way that keeps searches fast without requiring expensive high-memory servers.

Think of it like a library with a massive book collection. Instead of keeping every book on open shelves (expensive, needs a huge room), DiskANN keeps a compressed card catalog in the reading room (RAM) and stores the actual books in a nearby warehouse (SSD). When you request a book, the librarian first narrows down candidates using the card catalog, then retrieves only the few books you actually need from the warehouse.

The technical machinery behind this is a graph-based index called Vamana. According to Microsoft Research, the Vamana graph connects data points in a bounded-degree directed structure where each node links to neighbors at multiple distance scales — short-range connections for precision and long-range connections for fast traversal across the graph. This design ensures that any search query reaches its approximate nearest neighbors in a small number of hops, typically logarithmic relative to the dataset size. The Vamana algorithm was introduced at NeurIPS 2019 in the paper “Fast Accurate Billion-point Nearest Neighbor Search on a Single Node.”

During index construction, DiskANN builds this Vamana graph and then applies product quantization (PQ) to create compressed vector representations that fit in RAM. The full-precision vectors stay on SSD. At query time, the system traverses the graph using the compressed vectors to identify a shortlist of candidates, then fetches the full-precision vectors from disk for only those candidates to compute exact distances and produce the final ranking.

According to DiskANN GitHub, the project was originally written in C++ but has been fully rewritten in Rust, with the current main branch containing the Rust implementation. The legacy C++ code is preserved on a separate branch. The library is released under the MIT license.

How It’s Used in Practice

The most common scenario where teams encounter DiskANN is when building similarity search for datasets that have outgrown in-memory solutions. A product team running semantic search over a document archive with hundreds of millions of embeddings can use DiskANN to serve queries on a single commodity server instead of provisioning a cluster with terabytes of RAM.

According to Microsoft SQL Blog, DiskANN is also natively integrated into SQL Server 2025 as a vector index type. This means enterprise teams already running SQL Server can add vector search capabilities without deploying a separate vector database — they create a DiskANN index on an embedding column and query it with standard SQL.

Pro Tip: Start by benchmarking your recall requirements. DiskANN trades a small amount of recall for large memory savings. If your application tolerates 95-98% recall instead of 99.9%, DiskANN on SSD will cost a fraction of what an all-RAM solution would, with query latencies still in the low milliseconds.

When to Use / When Not

Scenario	Use	Avoid
Dataset exceeds available RAM (hundreds of millions to billions of vectors)	✅
Small dataset (under a few million vectors) that fits comfortably in memory		❌
Single-machine deployment preferred over distributed clusters	✅
Sub-millisecond latency required with zero tolerance for disk I/O variance		❌
Cost-sensitive search where hardware budget is constrained	✅
Streaming index with high-frequency real-time inserts and deletes		❌

Common Misconception

Myth: DiskANN is slow because it reads from disk instead of RAM. Reality: DiskANN only accesses the SSD during a final re-ranking step on a small number of candidates. The graph traversal and initial candidate filtering happen entirely in RAM using compressed vectors. Disk reads are sequential and minimal — typically a handful per query — which modern SSDs handle in microseconds.

One Sentence to Remember

DiskANN lets you search billions of vectors on a single machine by keeping a compressed index in RAM and the full data on SSD, trading a small slice of memory for a large reduction in hardware costs. If you’re working with vector indexes and hitting memory limits, DiskANN is the first architecture to study.

FAQ

Q: How does DiskANN compare to FAISS for large-scale search? A: FAISS is optimized for in-memory and GPU-accelerated search. DiskANN targets datasets too large for RAM by storing full vectors on SSD, making it more cost-effective at billion scale.

Q: Can I use DiskANN with Python? A: Yes. The diskannpy package provides Python bindings for building and querying DiskANN indexes, though the core library runs natively in Rust for performance.

Q: Does DiskANN only work with Microsoft products? A: No. DiskANN is open-source under the MIT license and runs independently. Its SQL Server integration is an additional deployment option, not a requirement.

Sources

Microsoft Research: DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node - Original research paper introducing the Vamana graph index and SSD-based search approach
DiskANN GitHub: microsoft/DiskANN - Official repository with Rust implementation and documentation

Expert Takes

MONA

DiskANN’s Vamana graph is a bounded-degree directed graph where every node connects to neighbors across multiple distance scales — short-range for precision, long-range for fast traversal. This structure guarantees that any query reaches its approximate nearest neighbors in logarithmic hops. The real contribution is demonstrating that disk-resident full-precision vectors can match the recall of purely in-memory indices while fitting on commodity hardware.

MAX

If you’re building a retrieval pipeline and your dataset outgrows what fits in RAM, DiskANN changes the cost equation. Instead of scaling up to machines with massive memory, you build the Vamana index, compress vectors with product quantization for fast candidate filtering in memory, then fetch full-precision vectors from SSD only for the final ranking step. The workflow is: build, compress, query, rank.

DAN

Every company with a large product catalog, document archive, or recommendation engine eventually hits the memory wall. DiskANN is Microsoft’s answer: billion-scale search on hardware that costs a fraction of all-RAM alternatives. Its native integration into SQL Server means enterprise teams can run vector search inside their existing database without bolting on a separate system. That simplification wins adoption.

ALAN

When search systems scale to billions of vectors, every design decision about which neighbors to keep or discard carries consequences. Product quantization compresses vectors, but compression always loses information — and those losses are not evenly distributed across all data points. If certain types of queries systematically receive lower recall because their vectors compress poorly, the system introduces a quiet bias that no one audits.

Back to Glossary