MemGPT
Also known as: Memory-GPT, MemGPT pattern, Letta MemGPT
- MemGPT
- MemGPT is an agent architecture that gives a large language model two memory tiers — a fast in-context working memory and a slower external store — with the model itself paging information between them through tool calls, much like an operating system.
MemGPT is an agent architecture that equips an LLM with two memory tiers — fast in-context memory and slower external storage — with the model itself moving information between them through tool calls.
What It Is
Agent developers hit a wall when conversations or task histories outgrow the context window. Once important details fall off the back of the prompt, the agent forgets them — even if it referenced them five minutes ago. MemGPT, introduced in a 2023 paper by Charles Packer and colleagues, was the first design that asked: what if the language model managed its own memory like a small operating system?
The pattern splits memory into two tiers. The main context is what the model sees right now — system instructions, recent messages, and a working scratchpad. The external context lives outside the prompt: long-term notes, archived conversations, and reference documents stored in databases or files. The model never reads the external context directly. Instead, it emits tool calls — small function invocations — to page relevant information into the main context when needed and write less-relevant material out.
The LLM itself decides what to page in and what to evict, the same way a CPU asks the operating system for memory pages. The agent runs an inner loop where it can keep working, search its own archives, edit its own notes, and respond to the user, all without a human telling it which memories matter. That self-editing behavior is what made the paper influential — it turned the prompt from a static input into something the agent actively curates. In the context of the 2026 benchmark race on LoCoMo and LongMemEval, MemGPT is the architectural ancestor most newer agent memory systems either extend or react against.
How It’s Used in Practice
Most readers encounter MemGPT through Letta, the open-source agent framework that now houses the project. According to the Letta Blog, MemGPT was folded into Letta in September 2024, and active development — SDK, documentation, benchmark submissions — happens under the Letta name. Practitioners building stateful chat agents or research assistants pull in Letta to get the OS-style memory loop without rebuilding it from scratch.
Inside Letta, an agent maintains three working pieces: a small core that always sits in the prompt (persona, user facts), a recall log of recent interactions, and an archival store the agent searches by similarity. When a question arrives that needs old context, the model issues a search call, pulls matching entries into the prompt, then answers. The same pattern shows up in benchmarks like LoCoMo and LongMemEval, where filesystem-style memory architectures repeatedly land in the top tier.
Pro Tip: When evaluating whether MemGPT-style memory fits your use case, check whether your agent actually needs to remember across sessions. If a single long context window covers the whole conversation, you are paying for paging machinery you will never call. Reserve OS-style memory for agents that live for days or weeks.
When to Use / When Not
| Scenario | Use | Avoid |
|---|---|---|
| Long-running personal assistant that learns user preferences over weeks | ✅ | |
| Single-shot Q&A where the prompt fits inside the context window | ❌ | |
| Customer support agent that needs to recall earlier tickets | ✅ | |
| Stateless code completion in an IDE | ❌ | |
| Research agent reading hundreds of papers across a project | ✅ | |
| Quick prototype where adding a memory layer slows iteration | ❌ |
Common Misconception
Myth: MemGPT gives the language model an unlimited context window. Reality: The model still works inside its native context window. MemGPT does not enlarge that window — it teaches the agent to swap relevant content in and out, so older information stays accessible without being permanently loaded into the prompt.
One Sentence to Remember
Treat MemGPT not as a bigger brain but as a smarter desk: the same workspace, with the agent constantly pulling the right notes from drawers it manages itself. If your agent’s life is longer than its prompt, this pattern is worth studying through Letta’s documentation before picking a memory backend.
FAQ
Q: Is MemGPT still maintained as a separate project? A: No. According to the Letta Blog, MemGPT became part of Letta in September 2024, and ongoing engineering, SDK releases, and documentation now live under the Letta name. The original architectural ideas and paper remain the reference.
Q: How is MemGPT different from a vector database? A: A vector database stores and retrieves embeddings on request. MemGPT is the agent loop around such a store: the model itself decides when to query, what to write back, and which memories to keep in the prompt right now.
Q: What benchmarks evaluate MemGPT-style memory? A: LoCoMo and LongMemEval test long-horizon recall across many conversation turns. According to the Letta Blog, the Letta filesystem implementation derived from MemGPT competes near the top of LoCoMo, ranking among leading approaches in the 2026 race.
Sources
- MemGPT paper on arXiv: MemGPT: Towards LLMs as Operating Systems - Original paper introducing the two-tier memory pattern and the OS metaphor.
- Letta Blog: Benchmarking AI Agent Memory: Is a Filesystem All You Need? - Maintainer post covering Letta’s MemGPT-derived architecture and current LoCoMo results.
Expert Takes
Not a bigger context window. A control loop. MemGPT borrows the operating-system idea that processes do not see all of memory at once — they see a working set, and a kernel pages the rest. Replacing the kernel with the model itself makes the architecture interesting and fragile at the same time. The agent’s judgment about what to page becomes part of the system’s correctness, not an afterthought.
MemGPT is best read as a specification of the agent’s responsibilities. The model is not just answering — it is also told, in the system prompt, when it owes the user a memory write, when it should query its archive, and what counts as fresh. Once those rules are explicit, the same agent can run on different backbones and produce comparable behavior. Vague memory contracts produce vague agents.
The vendors selling agent memory platforms are racing to commoditise the MemGPT pattern. You either ship an agent loop that lives longer than the prompt, or you ship a chatbot that forgets you between sessions. Buyers stopped accepting the second answer about a year ago. Whoever wins the LoCoMo and LongMemEval leaderboards inherits the default position in roadmaps from companies that do not want to build memory plumbing themselves.
An agent that edits its own memory writes its own past. Who decides what it remembers about you between sessions? The agent? The vendor whose system prompt told it which interactions count as important? You, hopefully, but only if the interface lets you read and revise the archive. MemGPT made self-editing memory a real engineering pattern. It also made memory governance a real product question, and most teams have not answered it yet.