Model Registry
Also known as: ML model catalog, model store, model repository
- Model Registry
- A model registry is a centralized system that stores, versions, and tracks machine learning models throughout their lifecycle. It records metadata about each version—training data, performance metrics, and deployment status—serving as the governance layer between experimentation and production in MLOps workflows.
A model registry is a centralized catalog that stores, versions, and tracks machine learning models through their full lifecycle—from experiment to production deployment and eventual retirement.
What It Is
When an ML team has trained a dozen model versions, the question “which one is running in production right now?” sounds simple until nobody can answer it. Model files end up scattered across shared drives, S3 buckets, and local laptops. Metrics live in spreadsheets. Approvals exist only in Slack threads. A model registry solves this by giving every trained model a stable address, a version number, and a lifecycle state—all in one place.
Think of it as Git for trained models, extended to cover not just the artifact but its context: what training data was used, what evaluation scores it achieved, who approved it for production, and when. Unlike a plain file store, the registry understands that a model passes through stages—experimental, staging, production, archived—and records each transition with an audit trail. That history is what makes rollback possible in minutes rather than days.
In an MLOps workflow (machine learning operations—the set of practices for managing ML systems in production), the registry sits at the center. It connects upstream training systems with downstream serving infrastructure. This is what makes model versioning and artifact stores practical rather than theoretical: the registry is the organizational layer that records the metadata, tracks the lineage, and controls which version is active at any given moment. The article on versioning, metadata, and artifact stores explores these concepts at depth—the registry is where they converge in a working system.
How It’s Used in Practice
The most common way a product manager or team lead encounters a model registry is when their ML team says “we just pushed v2.3 to staging” or “we’re rolling back to the previous version.” What they’re describing is a state change in the registry.
In practice, the workflow looks like this: a model finishes training, the training script logs the run to an experiment tracker (recording loss curves, hyperparameters, etc.), and the best-performing version gets registered. The registry assigns it a version number, stores the artifact path, and sets its stage to “staging.” After evaluation in a staging environment, a human reviewer or an automated gate promotes it to “production.” That promotion is recorded alongside who approved it and when. When something goes wrong, the rollback is a state change—point the registry back to the prior production version, and serving infrastructure follows.
Pro Tip: Use the registry’s tagging system to attach the training dataset version alongside the model version. When a production incident surfaces six months later, knowing which data vintage the model saw is often the fastest path to root cause—especially when the issue turns out to be a data quality problem, not a model problem.
When to Use / When Not
| Scenario | Use | Avoid |
|---|---|---|
| Multiple model versions running or being evaluated simultaneously | ✅ | |
| Solo project with one model that never gets updated | ❌ | |
| Regulatory audit requiring model provenance and approval records | ✅ | |
| Quick proof-of-concept where the model lives only in a notebook | ❌ | |
| Team handing off a trained model to a separate serving or ops team | ✅ | |
| Model is embedded directly in training code and never reused elsewhere | ❌ |
Common Misconception
Myth: A model registry is just file storage—you could replace it with an S3 bucket and a README.
Reality: A file store holds the artifact. A registry answers questions the file store cannot: which version is in production right now, who approved the transition, what changed between v1 and v2, and what training data produced each version. The registry provides structured metadata, lifecycle state management, access control, lineage tracking, and query interfaces. An S3 bucket with a README degrades into guesswork as soon as more than one person touches it.
One Sentence to Remember
A model registry is the governance contract between the team that builds models and the systems that run them—without it, version accountability exists only in someone’s memory.
FAQ
Q: What’s the difference between a model registry and an experiment tracker? A: An experiment tracker logs training runs as they happen—hyperparameters, metrics, and loss curves. A model registry stores the outputs you’ve decided to keep and tracks their lifecycle from staging through production and into retirement.
Q: Can a model registry handle large language models? A: Yes. LLMs introduce size and serving complexity, but the core registry function—versioning, metadata, promotion workflows—applies the same way. Some teams also track fine-tuned adapter weights separately from the base model in the same registry.
Q: When should a model version be archived rather than deleted? A: Archive any version that has ever been in production. Regulatory requirements and audit trails often require the ability to reproduce a prediction from a specific date. Deletion removes that ability permanently; archiving preserves it at low cost.
Expert Takes
A model registry is the operational equivalent of a type system for machine learning artifacts. Without it, the question “what model is this?” has no principled answer at runtime—versions drift, metadata scatters, and the link between a production prediction and the training conditions that produced it breaks. Reproducibility in ML depends on treating model artifacts as first-class versioned objects, not just files on a storage layer.
In practice, the registry is where the spec lives for anything downstream that needs to load a model. When the serving layer, the A/B testing framework, and the monitoring system all query a single source of truth for “what is the current production model,” you eliminate the class of deployment bugs that come from version drift—where two services silently disagree about which model they’re using and produce different outputs for the same request.
Teams that skip the model registry always rediscover why it exists the hard way: a production model nobody can identify, a rollback that takes three days instead of three minutes, and an audit question that nobody can answer. The registry is the difference between an ML operation and an ML experiment that somehow made it to production and stayed there by luck.
The model registry shapes what accountability looks like in automated systems. When a model moves from staging to production, who approved it and on what evidence? When a model causes harm, can the decision chain be reconstructed? A registry makes these questions answerable—but only if teams treat the promotion step as a genuine gate, not a rubber stamp attached to an automated pipeline.