Multi-Hop Reasoning

Also known as: multi-hop QA, multi-step reasoning, chained reasoning

Multi-Hop Reasoning: Multi-hop reasoning answers questions that require chaining multiple pieces of evidence across entities or facts, where each hop traverses a relationship. It contrasts with single-shot retrieval and is the central motivating use case for GraphRAG and knowledge-graph-augmented language models.

Multi-hop reasoning is the process of answering a question by chaining together multiple pieces of evidence, where each step traverses a relationship between entities or facts.

What It Is

Most questions worth asking don’t have answers sitting in one paragraph. “Which authors influenced the writers who taught at Iowa Writers’ Workshop in the 1980s?” first needs you to identify those teachers, then trace their literary influences. That’s two hops. Multi-hop reasoning is the term for any answer that requires stitching together at least two distinct facts, where the second fact only becomes relevant once you know the first.

For people building on top of large language models — especially anyone integrating retrieval-augmented generation (RAG) into a product — multi-hop questions are where vector search alone starts to break. A single embedding lookup finds chunks that look like the question. It does not find the chunk containing the bridge fact you need to reach the actual answer.

Mechanically, a hop is a step across a relationship. In a knowledge graph, that relationship is an explicit typed edge — WROTE, INFLUENCED_BY, EMPLOYED_AT. The system answers a multi-hop query by traversing these edges in sequence: find node A, follow an edge to node B, follow another edge to node C, return what’s attached to C. The reasoning path is visible and auditable.

In a vector-only RAG system, the equivalent of “hopping” is iterative retrieval — query the index, read the result, generate a follow-up query, repeat. According to StepChain GraphRAG (arXiv), this approach often drifts off the actual reasoning path because each follow-up query is shaped by what the model just read, rather than by the structure of the underlying facts. The drift compounds with each step. Graph traversal, by contrast, follows relationships that exist in the data — not relationships the model guessed from prose.

How It’s Used in Practice

The mainstream encounter with multi-hop reasoning in 2026 is through a GraphRAG-style retrieval layer sitting behind a chat product or internal Q&A tool. A user asks a question that touches more than one entity — a customer’s account history plus the product they bought plus the warranty terms attached to that product — and the system walks the graph from “customer” to “purchase” to “warranty” to assemble the context the language model uses to draft its answer.

Product teams typically reach for this when their RAG pilot stalls on questions employees actually ask. According to Neo4j Blog, replacing flat similarity search with graph traversal materially improves accuracy on multi-hop questions because the graph encodes which facts are connected to which, rather than relying on prose proximity.

Pro Tip: Before assuming you need GraphRAG, write down five real user questions and count the hops in each. If most are single-hop (“what is our refund policy?”), a vector index plus good chunking is faster to build and cheaper to run. If three or more cross two-plus entities, that’s your case for a graph.

When to Use / When Not

Scenario	Use	Avoid
Questions span entities (customer → order → policy)	✅
Single-document FAQ with no entity relationships		❌
Auditing the reasoning path matters (compliance, legal)	✅
Latency budget under a hundred milliseconds with a tiny corpus		❌
Investigative or research workflows that follow citations	✅
Ad-hoc summarization of one long document		❌

Common Misconception

Myth: Adding a re-ranker to vector search solves multi-hop reasoning. Reality: Re-ranking improves which chunks rise to the top, but it cannot invent the bridge fact. If the second hop lives in a chunk that doesn’t share surface vocabulary with the original question, no amount of re-ranking pulls it in. Multi-hop reasoning needs explicit graph structure or carefully engineered iterative retrieval — re-ranking alone is not the fix.

One Sentence to Remember

If a question can’t be answered without an “and then” — and then which department, and then which contract, and then which clause — you are looking at a multi-hop problem, and a flat vector index will quietly fail it.

FAQ

Q: What’s the difference between multi-hop reasoning and chain-of-thought? A: Chain-of-thought is the model thinking step by step inside its own output. Multi-hop reasoning is fetching evidence in steps from outside the model, then reasoning over what came back. They often combine, but they’re different layers.

Q: Can vector RAG do multi-hop reasoning at all? A: Yes, with iterative retrieval and re-ranking, but accuracy degrades quickly past two hops. According to StepChain GraphRAG (arXiv), each iteration risks drifting from the original reasoning path, especially on chains that span several entities.

Q: Do I need a knowledge graph to do multi-hop reasoning? A: Not strictly — agentic RAG approaches use planner LLMs to fake graph traversal over text. But for production reliability and auditability, an explicit graph remains the most predictable substrate today.

Sources

StepChain GraphRAG (arXiv): StepChain GraphRAG: Reasoning Over Knowledge Graphs for Multi-Hop QA - 2025 paper formalizing structural drift in iterative vector RAG and proposing graph-based alternatives.
Neo4j Blog: How to Improve Multi-Hop Reasoning With Knowledge Graphs and LLMs - Vendor walkthrough of graph traversal as a substrate for multi-hop QA.

Expert Takes

MONA

Multi-hop reasoning is not a new capability of language models — it’s a property of the retrieval substrate underneath them. The model still composes the final answer one token after another. What changes is whether the context window contains the connecting fact at all. Graph traversal makes that connection explicit and inspectable; iterative vector search reconstructs it probabilistically and often imperfectly. Same generator, different evidence supply chain.

MAX

The mistake teams make is treating multi-hop as a model problem when it’s a context problem. Your retrieval layer is responsible for assembling the right facts into the prompt; the model is responsible for reasoning over them. If the prompt is missing hop two, no amount of better prompting saves the answer. Specify the relationships your domain actually has, encode them, and let traversal — not vibes — decide what enters the context window.

DAN

Vector-only RAG had its moment. The next tier of enterprise AI products will be judged by whether they can answer questions that cross systems — not just retrieve passages. Procurement, customer success, internal search: every team owns data that lives across entities. Vendors who get multi-hop reasoning right own those workflows. The ones who don’t are selling a smarter search bar to companies that already have one.

ALAN

Auditability is the part nobody talks about. When a model produces an answer that crossed several systems, who can prove which records it actually used? Graph traversal leaves a path you can replay. Iterative vector search leaves a trail of rephrased queries and ranked snippets that may not even be reproducible across runs. If the answer affects a person — a denied claim, a flagged transaction — opacity at the retrieval layer is not a small problem.

Back to Glossary