Neo4j

Also known as: Neo4j Graph Database, Neo4j Property Graph, neo4j

Neo4j
Neo4j is a native graph database that stores data as nodes and relationships in the property graph model and is queried with Cypher. It is the most common backing store for GraphRAG-style systems that pair knowledge graphs with large language models.

Neo4j is a native graph database that stores information as nodes and relationships, queried using Cypher, and serves as the most common backing store for GraphRAG and knowledge-graph applications.

What It Is

Most teams hit Neo4j the same way: they tried to answer a question like “which suppliers are two hops away from a sanctioned entity” using a SQL database, watched the joins explode, and went looking for something built for relationships. Where a relational database treats relationships as foreign keys you reconstruct at query time, Neo4j treats them as first-class records that already know what they connect.

The data model is called the property graph. Two ingredients: nodes (the things — a person, a document, a product, a concept) and relationships (the verbs between them — WROTE, MENTIONS, REPORTS_TO, CITES). Both nodes and relationships can carry properties (key-value pairs like name: "Ada", since: 2019, weight: 0.8). Nodes get one or more labels (:Person, :Document) so you can filter by type. That’s the whole conceptual surface — the rest is tooling.

Queries are written in Cypher, an ASCII-art language where the pattern (p:Person)-[:WROTE]->(d:Document) looks like the relationship it describes. Cypher started inside Neo4j, was opened up through the openCypher project, and is now formally aligning with GQL (ISO/IEC 39075:2024) — the first new ISO database language since SQL. According to the Neo4j Blog (Cypher and GQL), Cypher was a major input into that standard.

Under the hood, Neo4j uses index-free adjacency — each node holds direct pointers to its neighboring relationships, so following an edge is a constant-cost pointer hop, not an indexed lookup or a join. That makes deep traversals (five, ten, twenty hops) practical in a way they aren’t in a relational store. According to Neo4j Release Notes, the current line is calendar-versioned (e.g., 2026.04.0), with a 5.26 LTS branch maintained alongside it.

How It’s Used in Practice

The biggest reason a product or AI team encounters Neo4j today is GraphRAG. Vector search alone retrieves chunks that look similar to a question; it cannot tell you that two chunks describe the same person, or that one document refutes another. So teams build a knowledge graph alongside the vector index — extracting entities and relationships from their corpus, storing them in Neo4j, and letting the LLM walk the graph to assemble a grounded answer. Microsoft’s GraphRAG, LightRAG, and most LLM/KG tutorials assume a Neo4j-compatible store.

The typical loop looks like this: an extraction pipeline reads source documents, an LLM pulls out entities and relationships, those land in Neo4j as nodes and edges, and at query time the application retrieves a relevant subgraph (a query entity plus its neighborhood) and hands it to the LLM as context. Cypher does the retrieval; the graph supplies structure the prose alone never made explicit.

Pro Tip: Prototype your node labels and relationship types on a few hundred documents before you scale extraction up. Run the queries you actually plan to run in production against the prototype graph first. A bad relationship name reused across millions of edges is much harder to rename than to get right the first time.

When to Use / When Not

ScenarioUseAvoid
GraphRAG, LLM memory, or any system that needs multi-hop entity reasoning
Storing flat documents with no meaningful relationships
Fraud rings, recommendations, or supply-chain queries with deep traversals
High-volume time-series telemetry or simple key-value lookups
Knowledge graphs that combine structured and unstructured sources
Replacing a transactional OLTP database for generic CRUD workloads

Common Misconception

Myth: Neo4j is just a NoSQL document store with extra steps — you could replicate it with a few join tables in Postgres. Reality: A relational join recomputes relationships at query time using indexes. Neo4j stores relationships as first-class records with direct pointers between connected nodes (index-free adjacency), so a deep traversal stays cheap as the graph grows. You can model graph-shaped data in Postgres, but the cost profile of multi-hop queries is fundamentally different.

One Sentence to Remember

If your problem is “which things are connected, and how” rather than “give me one row by id”, a graph database changes the cost of the answer — and Neo4j is the default place teams reach when that question comes from a GraphRAG pipeline.

FAQ

Q: Is Neo4j a SQL database? A: No. Neo4j is a native graph database that uses the Cypher query language, not SQL. Cypher is now formally aligning with the ISO GQL standard published in 2024.

Q: Do I need Neo4j to build a RAG system? A: No, plain vector search works for many use cases. You add Neo4j when you need multi-hop reasoning across entities and relationships — that’s the GraphRAG pattern.

Q: Is Neo4j open source? A: Neo4j has a Community Edition under an open-source license alongside commercial Enterprise and AuraDB cloud offerings. Cypher itself is open through the openCypher project and the ISO GQL standard.

Sources

Expert Takes

The interesting property of a native graph database is index-free adjacency. Following a relationship is a constant-cost pointer hop, not a join recomputed at query time. That changes the algorithmic profile of pathfinding, community detection, and centrality measures. For GraphRAG, this matters because retrieval often means traversing many hops outward from a query entity — exactly the operation a graph engine is tuned for, and exactly the operation a relational store struggles with as depth grows.

Treat Neo4j like any other backing store: write a clear schema spec before you load data. Document node labels, relationship types, and required properties in a context file your agents can read. When an LLM later writes Cypher against your graph, it pulls from that spec instead of guessing. The graph is only as useful as the contract you wrote down — an undocumented graph degrades into a worse document pile.

Graph databases were a niche before LLMs. GraphRAG flipped that. The moment teams realized vector search alone hallucinates relationships it doesn’t actually have, knowledge graphs became the missing scaffolding — and Neo4j became the default substrate. If your roadmap includes anything labeled “agent”, “memory”, or “enterprise RAG”, you’re going to brush against a graph database. Pretending otherwise just delays the architectural conversation your team eventually has to have anyway.

A graph makes relationships explicit — and that cuts both ways. The structure that helps an LLM trace a clean answer also lets anyone with read access reconstruct who knows whom, who reports to whom, who interacted when. Before you load production data into a knowledge graph, ask: who can query this, what can they infer from the shape of the neighborhood, and would the people in those nodes be comfortable with what the graph reveals?