MAX guide 15 min read May 6, 2026

How to Build a GraphRAG Pipeline with Neo4j and LightRAG in 2026

Specification blueprint linking entities, relationships, and vector embeddings across a graph and vector database for GraphRAG.

Table of Contents

TL;DR

Vector RAG retrieves chunks. Knowledge Graphs For RAG retrieves chunks plus the relationships between them — that’s the only reason to take on the extra moving parts.
Treat the pipeline as five separable contracts: ingest, extraction, dual storage (graph + vector), retrieval, and synthesis. Specify each one before you let the AI write a single line.
Microsoft GraphRAG, Neo4j, LightRAG, and Qdrant are not interchangeable boxes. Pick by query pattern and budget — global summarization, multi-hop QA, or low-cost incremental ingest each prefer a different stack.

A bank’s compliance team wires standard vector RAG over five years of policy documents. They ask: “Which counterparties of Subsidiary X are also exposed to Sanctioned Entity Y through any chain of ownership?” The system returns the three chunks that contain the most semantically similar sentences. None of them have the answer, because the answer lives in the edges between documents, not in any single chunk. That’s the failure mode you’re here to fix.

Before You Start

You’ll need:

An AI coding tool — Claude Code, Cursor, or Codex
Working knowledge of Entity Extraction and Multi-Hop Reasoning
A real corpus you want to query — not a toy dataset, because graph value only shows on connected data
An LLM provider account (the indexing step is LLM-heavy)

This guide teaches you: how to decompose a Knowledge Graph RAG pipeline into five contracts so your AI tool can build each piece without guessing the interfaces.

Vector Search Doesn’t Walk Edges

Here’s what I see every quarter. A team ships a working vector RAG. Quality plateaus. They add a re-ranker. They tune chunk size. They swap the embedding model. The chart barely moves.

Then someone asks a question that needs two hops — “find the supplier of the supplier” — and the whole system goes blank. That’s not a tuning problem. That’s a missing data structure.

It worked on Friday. On Monday, accuracy collapsed because the new corpus contained relationships the chunker silently destroyed.

Vector retrieval is local. Graph retrieval is structural. You don’t fix one with more of the other — you compose them. And if you skip the decomposition, your AI tool will dump everything into one orchestrator class and you’ll be debugging it for a week.

Step 1: Identify the Pipeline Components

A Microsoft GraphRAG pipeline is not one system. It’s five, and each one fails differently.

Your system has these parts:

Ingest — pulls source documents, normalizes text, chunks them. Owns: document IDs, chunk boundaries, source metadata. Failure mode: chunks split entities across boundaries, killing extraction recall.
Entity & Relationship Extraction — runs an LLM over chunks to pull (entity, relationship, entity) triples and short summaries. Owns: schema of entity types, relationship types, summary length. This is where most of your token spend goes.
Dual Storage — a graph store for entities and edges, a vector store for the embeddings of chunks (and optionally entity summaries). Owns: ID synchronization between the two. Failure mode: drift between the graph node ID and the vector point ID, and your retriever returns orphans.
Retriever — given a query, decides what to fetch: vector neighbors, graph traversals, or a hybrid. Owns: search strategy per query type (local, global, drift, hybrid). Failure mode: one strategy used for all questions; some answers go missing.
Synthesis — composes the retrieved context into the LLM prompt and produces the final answer. Owns: token budget, citation format, hallucination guards.

The reason this decomposition matters: the same five parts appear whether you use Microsoft GraphRAG, the neo4j-graphrag library, or LightRAG. The boundaries are stable. The implementations swap.

The Architect’s Rule: If you can’t draw the data flow on a napkin in five labelled boxes, the AI can’t build it either.

Step 2: Lock Down the Contract

This is where most pipelines die. The AI generates plausible-looking glue code that pins the wrong vector dimension, or uses a deprecated package, or assumes consumption-priced infrastructure when you booked capacity.

Context checklist — every item must be specified before any code is written:

Graph database choice and version — Neo4j 2026.01+ if you want the new Cypher SEARCH clause for vector queries (Neo4j Cypher Manual). Older db.index.vector.queryNodes syntax still works but is no longer the recommended pattern.
Graph driver package — neo4j-graphrag v1.15.0, released April 23, 2026 (neo4j-graphrag on PyPI). The older neo4j-genai package is deprecated and will not be maintained — pin the new one or migrate before you start.
Vector store and dimension — Neo4j’s native vector index caps at 4096 dimensions per index (Neo4j Cypher Manual). If your embedding model exceeds that, you need a separate vector store like Qdrant.
Hybrid wiring contract — if you split graph and vectors across two databases, write the cross-reference rule explicitly: every Qdrant point ID equals the corresponding Neo4j node ID. The official Qdrant + Neo4j recipe uses QdrantNeo4jRetriever for exactly this (Qdrant Docs).
Extraction LLM and budget cap — name the model and the per-document ceiling. Indexing dollar cost varies by roughly 10× between GPT-4o and GPT-4o-mini for the same corpus, per the market scan.
Query types you actually need — local lookup, global summarization, multi-hop traversal, or all three. The retriever spec follows from this; don’t let the AI guess.
Hosting capacity — Neo4j AuraDB Free covers 200,000 nodes and 400,000 relationships (Neo4j’s pricing page). AuraDB Professional starts at $65 per GB / month and AuraDB Business Critical at $146 per GB / month, both capacity-based on reserved RAM, not per-query.

The Spec Test: If your context file doesn’t pin the vector dimension and the graph driver package version, the AI will pick whatever showed up most in its training set — and that is almost certainly the deprecated neo4j-genai, with a 1536-dimension assumption that breaks when you swap embedding models.

Step 3: Wire the Components in the Right Order

The build order matters because each layer depends on the contract of the previous one. Skip the order, and you debug a stack of half-built pieces instead of one piece at a time.

Build order:

Ingest first — chunking, document IDs, source metadata. No dependencies. Validate by counting chunks and spot-checking that no entity (a person name, a contract number) is split across a chunk boundary.
Extraction next — feeds on the ingest output. Lock the entity-type whitelist before running. Extraction drift is the silent killer: if you don’t constrain the schema, a second indexing run will produce different node types from the same data and your graph becomes inconsistent.
Storage layer — graph store and vector store written in the same transaction-like step. Both write keyed by the chunk ID and entity ID from Step 2. If you split this into two separate runs, you get drift.
Retriever — built last, because it depends on what’s actually in the stores. Pick the search strategy per query type. Microsoft GraphRAG uses local search for entity-centric questions and global search for Community Detection-summarized questions; the official docs split the Indexer and Query packages for this reason (Microsoft GraphRAG Docs).
Synthesis — wraps the retriever. This is the only layer that talks to the user. Token budget and citation rules belong here, not in the retriever.

For each component, your context must specify:

What it receives (input shape, IDs, types)
What it returns (output shape, persisted IDs, error envelope)
What it must NOT do (no swallowing extraction failures silently, no ID generation outside the ingest layer)
How to handle failure (retry policy, partial-success semantics, dead-letter for malformed chunks)

If you’re cost-constrained, this is also the layer where you choose between full GraphRAG and LightRAG. LightRAG (HKUDS, EMNLP 2025; v1.4.10 released Feb 27, 2026, per the LightRAG GitHub repository) does incremental graph updates without a full rebuild and uses a simpler dual-level retrieval. The LightRAG paper reports roughly 100 tokens per query versus roughly 610,000 tokens per query for GraphRAG on the UltraDomain benchmark — about 6,000× lower — and an average 80 ms response versus 120 ms for baseline RAG (LightRAG paper, ACL Anthology). Treat both numbers as author-reported on their own benchmark, not independent verification.

Step 4: Prove It Walks the Edges

Vector RAG validation is “did we find the right chunk?” Graph RAG validation has to test what the graph adds — the multi-hop and global queries vector alone can’t answer.

Validation checklist:

Single-hop sanity — failure looks like: questions about one entity return wrong or missing facts. Means your extraction missed entities or the vector index is broken. Fix extraction before the graph.
Multi-hop traversal — failure looks like: “find X connected to Y through Z” returns nothing. Means the relationship type wasn’t in your extraction schema, or the retriever isn’t doing graph traversal. Inspect the Cypher generated by your retriever.
Global summarization — failure looks like: “what are the main themes in the corpus?” returns three random chunks. Means you skipped the community-detection step during indexing. That is the step that lets GraphRAG answer corpus-level questions.
ID drift between stores — failure looks like: retriever returns a graph node with no associated chunk text, or a vector hit with no graph context. Means your dual-storage write step is split. Pull a sample, join by ID, count the orphans.
Cost-per-query ceiling — failure looks like: indexing one document costs more than your monthly budget for the whole corpus. Switch to LazyGraphRAG (currently shipped inside Microsoft Discovery and Azure Local public preview; mainline microsoft/graphrag integration is pending as of Q2 2026) or to LightRAG.

For production scale reference: Lettria reported running over 100M embeddings on the Qdrant + Neo4j hybrid with sub-200 ms latency and a 20–25% accuracy uplift versus traditional RAG (Neo4j Blog). That’s a real-world ceiling, not a theoretical one — and it works only because the team owned all five contracts above.

Five-stage GraphRAG pipeline showing ingest, extraction, dual storage, retriever, and synthesis, with explicit contracts between each stage. — The five contracts of a GraphRAG pipeline: lock each one before you let the AI generate code.

Common Pitfalls

What You Did	Why AI Failed	The Fix
One-shot “build me a GraphRAG pipeline”	AI dumps ingest, extraction, and retrieval into one class	Decompose into five contracts before prompting
Didn’t pin the graph driver	AI picks the deprecated `neo4j-genai` package	Specify `neo4j-graphrag` 1.15.0+ in your context
Used 1536-dim embeddings without checking the index	Neo4j vector index works, until you try a model that exceeds 4096 dims	Pin embedding model and dimension; add Qdrant if you exceed the cap
Skipped the entity-type whitelist	Re-indexing produces different node types, graph drifts	Lock entity and relationship types in the extraction prompt
Mixed local and global queries through one retriever	Global questions return three random chunks	Split retriever by query type — local search vs. community-summary search
Treated GraphRAG OSS as “Microsoft-supported”	Production team blocked on a bug with no SLA	The OSS repo is explicitly tagged a demonstration, not a supported Microsoft product (Microsoft GraphRAG Docs); use Azure offerings for SLAs

Pro Tip

Every contract you lock down before generation saves a day of debugging after. The five contracts in Step 2 are not optional documentation. They are the specification the AI is reading from. Vague specs produce vague pipelines, and a vague pipeline always becomes a rewrite.

Security & compatibility notes:
neo4j-genai deprecated: Migrate to neo4j-graphrag v1.15.0+. The old package will not receive maintenance updates. Action: pin neo4j-graphrag>=1.15.0,<2.0.0.
Microsoft GraphRAG config refresh: Run graphrag init --root [path] --force between minor version bumps; major bumps may need a migration notebook to avoid full re-indexing. Action: add the init step to your version-bump checklist.
LazyGraphRAG OSS gap: LazyGraphRAG ships inside Microsoft Discovery and Azure Local public preview, but the mainline microsoft/graphrag repo has not absorbed it as of Q2 2026 (Microsoft GraphRAG Discussions). Action: if you need the lazy variant outside Azure, plan a custom integration.
Microsoft GraphRAG OSS support status: Tagged as a demonstration, not a supported Microsoft product. Action: do not assume enterprise SLAs from the OSS lib alone.

Frequently Asked Questions

Q: When should you choose GraphRAG over vector-only RAG for enterprise knowledge bases?

A: Choose GraphRAG when real questions need multi-hop traversal or corpus-wide summarization — answers vector retrieval cannot reach because the data lives in relationships. Pilot with twenty real questions; if more than a third need two or more hops, build the graph.

Q: Which use cases benefit most from knowledge graph RAG: legal discovery, medical Q&A, or financial compliance?

A: Financial compliance wins biggest because ownership chains are explicitly graph-shaped. Legal discovery benefits from entity-centric local search across case law. Medical Q&A is the trickiest — clinical relationships are graph-rich, but extraction errors carry patient-safety risk; add a human review step.

Q: How do you build a GraphRAG pipeline step by step using Microsoft GraphRAG, Neo4j, and a hybrid Qdrant vector layer in 2026?

A: Follow the five-contract decomposition: ingest, extraction, dual storage, retriever, synthesis. Use microsoft/graphrag for the indexer, neo4j-graphrag 1.15.0+ for the driver, and Qdrant when embeddings exceed Neo4j’s 4096-dimension cap. Wire the two via QdrantNeo4jRetriever with shared IDs.

Your Spec Artifact

By the end of this guide, you should have:

A five-component map of your pipeline — ingest, extraction, dual storage, retriever, synthesis — with the inputs, outputs, and forbidden behaviors for each.
A locked contract document covering graph database version, driver package, embedding dimension, hybrid wiring rule, extraction LLM and budget cap, query types, and hosting capacity tier.
A validation checklist with at least one test per failure mode in Step 4 — single-hop, multi-hop, global, ID drift, and cost ceiling.

Your Implementation Prompt

Paste this into Claude Code, Cursor, or Codex when you’re ready to scaffold. It mirrors the five contracts from Step 1 and the constraint categories from Step 2. Fill the brackets with your actual values — every bracket is a decision you owe the AI before it writes code.

You are building a knowledge-graph RAG pipeline. Follow this specification exactly.
Do not invent components, packages, or interfaces beyond what is listed.

COMPONENTS (build in this order, no shortcuts):
1. Ingest — input: [source location]; output: chunks with stable IDs and metadata.
2. Entity & Relationship Extraction — input: chunks; output: triples + summaries.
   Entity types allowed: [list]. Relationship types allowed: [list].
3. Dual Storage — write graph nodes/edges and vector points in one step,
   keyed by shared chunk and entity IDs.
4. Retriever — implement these search strategies: [local | global | hybrid].
5. Synthesis — token budget [N], citation format [format], refusal rule on empty context.

CONTRACT (do not deviate):
- Graph DB: Neo4j [version, e.g. 2026.01]; use the SEARCH Cypher clause for vectors.
- Graph driver: neo4j-graphrag [>=1.15.0]. Do NOT use neo4j-genai (deprecated).
- Vector store: [Neo4j native if dim <= 4096, else Qdrant].
- Embedding model: [name]; dimension: [N].
- Hybrid wiring (if split): every Qdrant point ID == matching Neo4j node ID.
- Extraction LLM: [model]; per-document budget cap: [USD].
- Query types in scope: [list].
- Hosting tier: [AuraDB Free | Professional | Business Critical] sized for [N nodes, N relationships].

BUILD ORDER: ingest → extraction → storage → retriever → synthesis.
Do not start a layer until the previous layer's contract is testable.

VALIDATION (must pass before merge):
- Single-hop: [N] sample questions return correct facts.
- Multi-hop: [N] sample questions traverse [K] hops and return correct chains.
- ID drift: zero orphan graph nodes and zero orphan vector points on a [N]-row sample.
- Cost-per-doc indexing: under [USD].

OUTPUT: code per component, with the contract restated as a docstring. No more.

Ship It

You now have a way to think about KG-RAG that doesn’t collapse the second a stakeholder asks a multi-hop question. Five contracts. One build order. One validation checklist. The next time someone says “let’s add a knowledge graph to our RAG,” you can answer with a spec instead of a shrug — and the AI tool can build to that spec instead of guessing.

Sources

Microsoft’s GraphRAG repository: Releases — microsoft/graphrag - Version history and release notes for the GraphRAG OSS library
Microsoft GraphRAG Docs: Welcome — GraphRAG - Official Indexer and Query package documentation
Microsoft GraphRAG Discussions: Timeline for LazyGraphRAG Integration - Status of LazyGraphRAG merging into the OSS repo
neo4j-graphrag on PyPI: neo4j-graphrag · PyPI - Package metadata, version, deprecation note for neo4j-genai
Neo4j Cypher Manual: Vector indexes - Vector index dimension cap and Cypher SEARCH syntax
Neo4j’s pricing page: Cloud & Self-Hosted Graph Database Platform Pricing - AuraDB tier limits and per-GB pricing
Neo4j Blog: Integrate Qdrant and Neo4j to Enhance Your RAG Pipeline - Lettria production-scale numbers and hybrid pattern
LightRAG GitHub repository: HKUDS/LightRAG - Source, release tags, OpenSearch backend
LightRAG paper (ACL Anthology): LightRAG: Simple and Fast RAG (EMNLP 2025 Findings) - Token-cost and latency comparisons
Qdrant Docs: GraphRAG with Qdrant and Neo4j - Official hybrid recipe and QdrantNeo4jRetriever usage

Aha Moments

MONA

The reason vector retrieval plateaus on enterprise corpora is that embeddings collapse a document into a point in a space where similarity is local. That works beautifully for paraphrase matching and breaks immediately for transitive reasoning. A graph adds an explicit relational topology over the same content — the embedding tells you which nodes are close in meaning, the edges tell you which nodes are close in structure. GraphRAG composes both. The interesting research question is no longer “graph or vector” — it’s how to learn the right relational schema directly from the corpus, instead of asking an LLM to extract it with a fixed prompt every time.

DAN

Mona’s right that the schema problem is the next frontier — and that’s precisely where the market opportunity sits. Most teams shipping GraphRAG today are paying the indexing tax twice: once at build time, once every time the corpus updates. LazyGraphRAG and LightRAG just changed the unit economics enough that what used to be a research demo is now a buildable product line. The teams that lock down their five-contract spec early are the ones that will ship in months, not quarters. The teams that wait for “the perfect framework” will be debugging glue code while their competitor takes the customer.

ALAN

Both perspectives miss the part that keeps me up. Once you put a knowledge graph between the user and the corpus, the extraction prompt is making editorial decisions at scale — what counts as an entity, what counts as a relationship, what gets summarized away. Those decisions are invisible to the end user and almost invisible to the team running the pipeline. Multiply that across financial compliance, legal discovery, and clinical Q&A, and you have a layer of soft policy nobody voted on. So I’ll leave you with this: when your GraphRAG decides two parties are not connected, who is responsible for the answer it never let the user reach?

AI-assisted content, human-reviewed. Images AI-generated. Editorial Standards · Our Editors