Knowledge Graph

Also known as: KG, entity graph, semantic knowledge graph

Knowledge Graph
A structured representation of entities and their relationships stored as labeled nodes and edges, enabling machines to reason about connections between concepts. Commonly used in search engines, recommendation systems, and as structured input for graph neural networks.

A knowledge graph is a structured network of real-world entities and their relationships, stored as labeled nodes and edges, providing the kind of structured data that graph neural networks are designed to learn from.

What It Is

If you’ve ever asked Google a question and gotten a neat fact box instead of a list of blue links, you’ve already seen a knowledge graph in action. That box exists because Google organized billions of facts — people, places, companies, concepts — into a web of connected nodes. Knowledge graphs exist because flat tables and keyword searches break down when you need to answer questions that span multiple relationships. “Which drugs interact with medication X for patients who also have condition Y?” — a traditional database can answer that, but the query gets ugly fast. A knowledge graph makes it natural.

Think of it like a mind map, but machine-readable. Each circle (node) represents an entity — a person, a product, a disease, a city. Each line between circles (edge) represents a relationship — “works at,” “is located in,” “treats.” The labels on those connections matter as much as the nodes themselves, because they carry the meaning.

According to IBM, a knowledge graph consists of three core building blocks: nodes (entities), edges (relationships between entities), and labels that describe the type of each connection. This structure follows a subject–predicate–object pattern, sometimes called a “triple.” For example: (PyTorch Geometric)–(is a library for)–(graph neural networks).

What makes knowledge graphs particularly relevant when building graph neural networks is that the graph is the data. A GNN doesn’t take a flat CSV as input — it operates directly on graph structure. The adjacency matrix defines which nodes connect to which, and node embeddings capture each entity’s features. When you train a GNN on a knowledge graph, you’re teaching the network to predict missing edges, classify nodes, or generate embeddings that preserve the graph’s relational logic.

According to Wikipedia, the term gained mainstream recognition in 2012 when Google launched its Knowledge Graph, built on DBpedia and Freebase. Since then, knowledge graphs have spread into healthcare (mapping drug interactions), finance (fraud detection), and recommendation systems — anywhere questions require traversing multiple relationship hops.

How It’s Used in Practice

The most common place you’ll encounter knowledge graphs today is in retrieval-augmented generation (RAG) workflows. Traditional RAG retrieves documents by vector similarity — you embed a query, find the closest document chunks, and feed them to an LLM. GraphRAG adds a structural layer: instead of just matching text, the system traverses a knowledge graph to find connected facts. According to PuppyGraph Blog, this combination lets an LLM answer multi-hop questions that span several relationships, something flat retrieval struggles with.

For teams building GNNs, knowledge graphs serve as both training data and evaluation benchmarks. Libraries like PyTorch Geometric and DGL include standard knowledge graph datasets (FB15k, WN18) specifically for link prediction tasks — training a GNN to predict which edges are missing from an incomplete graph.

Pro Tip: Before building a knowledge graph from scratch, check if a public one already covers your domain. Wikidata, DBpedia, and domain-specific ontologies (SNOMED for healthcare, FIBO for finance) save months of manual entity extraction and relationship modeling.

When to Use / When Not

ScenarioUseAvoid
Multi-hop questions across connected entities
Simple keyword lookup or single-table queries
Training GNNs for link prediction or node classification
Small datasets with few entity relationships
Fraud detection tracing transaction chains
Storing time-series metrics or numerical logs

Common Misconception

Myth: A knowledge graph is just a fancy database — you can do the same thing with a relational database and enough JOIN statements.

Reality: Relational databases store data in fixed tables with rigid schemas. Knowledge graphs store data as flexible triples (entity–relationship–entity), making it straightforward to add new relationship types without schema migrations. The real difference shows up at depth — what takes a five-table JOIN in SQL is a simple traversal in a graph. That traversal efficiency is also why GNNs operate on graph structure rather than flattened tabular data.

One Sentence to Remember

A knowledge graph turns disconnected facts into a connected web of meaning — and that web is exactly the kind of structured input that graph neural networks are built to learn from.

FAQ

Q: What is the difference between a knowledge graph and a regular graph database? A: A graph database is the storage engine. A knowledge graph is the data model — entities with typed, labeled relationships that carry semantic meaning. You typically store a knowledge graph inside a graph database.

Q: Do I need a knowledge graph to train a graph neural network? A: Not always. GNNs work on any graph-structured data — social networks, molecular structures, citation networks. Knowledge graphs are one common input, especially for link prediction and entity classification tasks.

Q: How does GraphRAG differ from standard RAG? A: Standard RAG retrieves text chunks by vector similarity. GraphRAG traverses a knowledge graph to find structurally connected facts first, then passes those to the LLM — handling multi-hop reasoning that pure vector search misses.

Sources

Expert Takes

Knowledge graphs encode first-order relational logic as triples — subject, predicate, object. When a GNN operates on this structure, message passing propagates information along typed edges. The network learns relationship-specific transformations rather than treating all connections identically. The typing of edges is what separates knowledge graph reasoning from generic graph convolution on unlabeled adjacency matrices.

If your project spec says “the model needs to understand entity relationships,” the first question is whether those relationships are explicit or need to be inferred. A knowledge graph makes relationships first-class objects in your data model. That means your GNN pipeline starts with cleaner signal — less time on feature engineering, more time on the actual learning task. Define your ontology in the spec before writing a single line of training code.

Every major cloud provider now offers a managed graph database, and the GraphRAG pattern is becoming the default for enterprise AI retrieval. Organizations that invested early in building domain knowledge graphs have a structural advantage — their data is already organized for the multi-hop reasoning that LLMs struggle to do with flat text alone. The build-versus-buy decision on graph infrastructure is happening right now.

The edges in a knowledge graph encode someone’s decisions about what relationships matter and how to categorize them. “Works at” seems obvious, but “is related to” or “influences” carry subjective judgment baked into the schema. When a GNN trains on these graphs, it inherits those editorial choices as ground truth. Who defines the ontology is a governance question, not just a technical one — and it rarely gets asked until something goes wrong.