Zep

Also known as: Graphiti, Zep AI, Zep memory

Zep: Zep is an agent-memory and context-engineering platform that stores facts as edges in a temporal knowledge graph called Graphiti. Each edge tracks when a fact became true and when it was superseded, so memory updates over time without overwriting prior history.

Zep is a managed agent-memory platform built on a temporal knowledge graph. It stores facts as graph edges with validity intervals, letting agents update memories over time without losing prior context.

What It Is

Most AI agents have a short memory. They hold a conversation only as long as it fits inside the model’s prompt window, and once it spills over, earlier turns are dropped or compressed into a lossy summary. For an agent that works with a user across days or weeks — a coding assistant on a project, a support bot tracking past tickets, a research agent on a long investigation — that limitation breaks the experience. Zep exists to close the gap.

Zep is a managed memory layer that sits between an agent and its language model. Instead of replaying the entire conversation history on every call, the agent writes new information to Zep and queries it for the facts that matter right now. The retrieved block is small, current, and shaped to fit inside the prompt without crowding out the actual instruction.

The engine underneath is Graphiti, an open-source temporal knowledge graph. In a regular vector database, a document is a single embedded blob. In Graphiti, the same document is parsed into entities (people, products, projects) and relationships between them. Each relationship is an edge that carries two timestamps — when the fact became true, and when it was superseded. If a user changes their preferred email address, the old edge is closed, the new one is opened, and a query for “current email” returns the right answer without losing the history.

According to the Zep paper, the architecture combines this graph with hybrid retrieval — semantic search, BM25 keyword matching, and graph traversal — so a query can use whichever signal fits best. The same paper reports a Deep Memory Retrieval score of 94.8%, slightly above the MemGPT baseline of 93.4%, while the Zep website cites a P95 retrieval latency of around 200 milliseconds for hosted accounts.

For developers, the surface is small — a handful of API calls to write turns and fetch facts. Graphiti is also available as a self-hosted library for teams that need full control over their data.

How It’s Used in Practice

Most developers reach for Zep when they need long-term memory for a chatbot or agent built on OpenAI, Anthropic, or an open-weight model. The integration is small: at the end of each turn, the agent writes the user message and assistant reply to Zep. Before the next turn, it queries Zep for relevant facts and prepends them to the system prompt.

The result is an agent that knows what was said sessions ago without the developer managing transcripts or vector indexing. A support agent can recall that a customer already tried two fixes on a previous ticket. A coding assistant can remember which tech stack a team uses. A learning app can track which concepts a student has mastered.

Teams typically start on the free tier and move to paid tiers as traffic grows. According to the Zep website, the hosted service includes SOC2 Type II compliance, which removes a hurdle that used to block memory features in regulated industries.

Pro Tip: Don’t dump every assistant turn into Zep. Most replies are filler. Write the user’s stated preferences, decisions, and corrections — the things you’d want a teammate to remember after vacation. Retrieval quality is bounded by what you choose to record.

When to Use / When Not

Scenario	Use	Avoid
Multi-session agent that needs to remember user preferences and history	✅
Single-shot Q&A bot where each call is fully independent		❌
Support agent that should recall prior tickets across weeks	✅
Internal prototype that cannot send data to a managed service		❌
Coding or research assistant tracking an evolving project	✅
Use case dominated by document retrieval, not conversational facts		❌

Common Misconception

Myth: Zep is just a vector database with conversation logs in it. Reality: Zep is a temporal knowledge graph. Facts are stored as edges between entities with explicit validity intervals, so the system can answer “what was true when” — not just “what text is similar to my query.” Vector search is one of three retrieval signals, not the whole engine.

One Sentence to Remember

Zep gives agents a memory that updates over time instead of accumulating, which is the difference between an assistant that grows with you and one that gets buried in its own transcript.

FAQ

Q: Is Zep open-source? A: The Graphiti core is open-source on GitHub. The hosted Zep service — with managed retrieval, SOC2 compliance, and dashboards — is a commercial product with a free tier for development.

Q: How is Zep different from a vector database like Pinecone or Weaviate? A: Vector databases find similar text. Zep models facts as graph edges with validity intervals and combines vector search with keyword and graph queries, so it can track how facts change rather than just retrieve passages.

Q: Do I need to use a specific LLM with Zep? A: No. Zep is model-agnostic. It exposes an API that any agent built on OpenAI, Anthropic, Google, or a self-hosted model can call before and after each turn.

Sources

Zep paper: Zep: A Temporal Knowledge Graph Architecture for Agent Memory - technical paper introducing the architecture and benchmark results.
Zep website: Zep — Context Engineering & Agent Memory Platform - vendor documentation, pricing tiers, and current performance figures.

Expert Takes

MONA

Most memory systems treat the past as a flat blob — embed it, retrieve it, hope the right chunk surfaces. Zep takes a stricter view: a fact is a relationship between entities, and relationships have lifespans. By modeling validity intervals on graph edges, the system can reason about what was true when, not just what was said most often.

MAX

Memory is part of the spec, not an afterthought. If you tell an agent “remember the user prefers concise answers” and the next session ignores it, the failure isn’t intelligence — it’s that the context window forgot. A temporal graph turns that preference into a persistent, queryable edge instead of a fragile summary you have to keep re-injecting.

DAN

Memory is becoming the moat. Anyone can wire up a chat completion. The teams winning are the ones whose agents remember the customer’s last complaint, the spec written months ago, the policy change from last quarter. Zep is betting that managed memory beats DIY vector hacks the same way managed payments beat custom checkout stacks. If your agent forgets, your competitor’s doesn’t.

ALAN

Persistent memory raises a question worth asking before you turn it on. What happens when an agent remembers something the user said in frustration, or a fact that was true once and is no longer fair to surface? Temporal graphs solve the technical problem — facts can be marked superseded. But who decides which memories an agent should forget on request, and which it should hold?

Back to Glossary