MONA explainer 10 min read May 25, 2026

What Is AI Code Migration and How LLM Agents Translate Languages and Modernize Legacy Codebases

Diagram of an AI code migration pipeline translating legacy COBOL into Java through deterministic and LLM-agent stages

Table of Contents

ELI5

AI code migration uses LLM agents and rule-based tools to translate old code into modern languages or frameworks—turning COBOL into Java, or upgrading a decade-old app—while trying to keep its behavior identical.

A COBOL system somewhere is running payroll for millions of people right now. It compiles. It runs. And no one currently employed can safely change a line of it, because the engineers who understood it retired years ago. That is the problem AI code migration exists to solve—and the interesting part is not that a machine can rewrite the code, but how, because two very different kinds of machine both claim they can.

The tempting mental model is simple: paste the old code into a chatbot, ask for the new language, ship the result. The output even looks right—fluent, idiomatic, properly indented. That fluency is exactly the trap. A language model optimizes for plausible tokens, not preserved behavior. The code it produces is grammatically perfect and occasionally, silently, wrong.

Not translation. Reconstruction under constraint.

Two Machines, Two Definitions of “Correct”

Before any tool rewrites a line, it has to decide what “the same program” even means. There are two answers to that question, and they split the entire field in half.

What is AI code migration?

AI code migration is the use of LLM agents and automated transformation tools to translate code between languages and frameworks—COBOL, VB6, or PL/SQL into Java, C#, or Python—and to modernize legacy codebases, including framework version upgrades (Addepto). But that single label hides two fundamentally different mechanisms, and confusing them is the most common mistake teams make.

The first mechanism is deterministic. Tools like Codemod utilities parse your source into an Abstract Syntax Tree—a structured representation of the code’s grammar—modify specific nodes according to fixed rules, then regenerate the source. Meta’s jscodeshift does exactly this, wrapping a library called recast so the regenerated code keeps its original formatting (jscodeshift GitHub). Given the same input, it produces the same output every time. You can prove it correct.

The second mechanism is probabilistic. An LLM agent reads the old code, predicts the most likely modern equivalent token by token, and generates something that resembles a correct translation. It can handle ambiguity and undocumented intent that a rigid rule could never anticipate. But it offers no guarantee—only a high-probability guess wearing the syntax of certainty.

The whole discipline lives in the tension between those two. Deterministic tools are trustworthy but brittle; they only know the patterns someone wrote a rule for. Probabilistic agents are flexible but unverifiable on their own. The serious systems combine them.

Inside the Translation Loop

Watch a modern migration agent work and you will notice it does not behave like a one-shot translator. It behaves like an engineer debugging in a tight feedback cycle—and that loop is where the reliability comes from.

How does AI-assisted code migration and framework upgrading actually work?

Start with the deterministic end, because it sets the standard the agents are trying to reach. OpenRewrite, maintained by Moderne, does not operate on a plain syntax tree at all. It builds a Lossless Semantic Tree—a representation that is both type-attributed (it knows that customer is a Customer, resolved through the type system) and format-preserving (it remembers your exact whitespace and comments). Transformations are applied as composable “recipes” (OpenRewrite Docs). Because the tree carries type information, a recipe can reason about what the code means, not just what it says—and because the migration is rule-driven, the result is reproducible.

That distinction between a plain syntax tree and a semantic tree is not pedantic. It is the difference between a translator who knows only grammar and one who also knows the dictionary.

The agentic end works differently. Amazon’s Amazon Q Code Transformation (now surfaced under the AWS Transform brand) upgrades Java 8 and 11 Maven projects to Java 17 by auto-generating a transformation plan, updating dependencies, and refactoring deprecated code (Amazon Q Developer Docs). For the .NET world, the same family migrates Windows-bound .NET Framework applications to cross-platform .NET and produces a Linux compatibility readiness report. The capability is agentic and improves per execution, though its exact general-availability scope is best treated as a moving target.

The frontier is research like LegacyTranslate, a Multi Agent Systems method that splits the work across three specialized agents—an Initial Translation agent, an API Grounding agent, and a Refinement agent—and was applied to roughly 2.5 million lines of PL/SQL translated to Java (arXiv). The structure matters more than the numbers. One agent drafts, one checks the draft against real API contracts, one repairs the mismatches. This is the same Agent Planning And Reasoning loop a human follows: write, test, fix.

What makes the loop close at all is the environment. An agent that can only read code is guessing. An agent that can run the build, read the compiler errors, and execute the test suite is iterating—each failed test becomes a signal pointing toward the fix. Connecting the model to those tools is increasingly handled by the Model Context Protocol, an open standard for wiring LLM applications to external tools and data; its current specification is dated November 25, 2025, and Anthropic donated it to the Linux Foundation’s Agentic AI Foundation in December 2025 (Model Context Protocol spec).

The Pipeline, Part by Part

A production migration is never a single model call. It is an assembly line, and each station does one job the next one depends on.

What are the parts of an AI code migration pipeline?

Most pipelines, deterministic or agentic, share five stages:

Stage	What it does	Deterministic version	Agentic version
Ingestion & parsing	Turn source text into a structured tree	AST or Lossless Semantic Tree	Same tree, plus natural-language context
Transformation	Rewrite the tree into the target form	Fixed recipes / codemods	LLM proposes the rewrite
Grounding	Anchor the rewrite to real constraints	Type system, recipe preconditions	API contracts, tests, retrieval over the codebase
Validation	Prove the result behaves	Compile + test suite	Compile + test suite, fed back to the agent
Review	Human approves the change	Pull request	Pull request

The grounding stage is where probabilistic and deterministic approaches quietly converge. A deterministic recipe grounds itself in the type system; it physically cannot apply a transformation whose preconditions are unmet. An agent grounds itself by retrieving the actual function signatures it must call and by running the tests—the verification a rule gets for free, the agent has to earn at runtime.

Notice what the agent is really doing here. It is not recalling a memorized translation; it is conditioning its next-token predictions on the type signatures, the error messages, and the test results placed in its context. Change what you put in front of it, and you change the geometry of what it generates next.

The validation stage is the one nobody can skip. Without a test suite, an agentic migration is a confident assertion with no proof. With one, every red test is a gradient the agent can descend.

AI code migration pipeline: ingestion and parsing, transformation, grounding, validation, and human review stages — The five stages most AI code migration pipelines share, from parsing the source tree to human review.

What the Split Predicts

Once you see migration as deterministic-versus-probabilistic with a verification loop between them, you can predict where each approach wins and where it fails.

If the transformation is mechanical and well-specified—Java 8-to-17 deprecation fixes, a known framework upgrade—expect deterministic recipes to beat LLMs on cost, speed, and trust. An internal Amazon team reported upgrading 1,000 production applications from Java 8 to 17 in two days, about ten minutes per app, using the agentic upgrader (AWS Blog)—but that is a vendor benchmark on a highly structured task, not a guarantee for arbitrary code.
If the source carries undocumented business logic with no clean rule, expect an LLM agent to handle the ambiguity that a recipe cannot—provided you can verify the output.
If your legacy code has strong test coverage, expect agentic migration to converge, because each failing test gives the agent something concrete to fix.

Rule of thumb: if a behavior is checkable by a test, an agent can iterate toward it; if it is not, you are trusting probability and calling it migration.

When it breaks: the dangerous failure mode is the migration that compiles and passes a weak test suite while silently changing behavior at the edges—a Hallucination expressed as valid code. Deterministic tools fail loudly when they meet a pattern with no recipe; agents fail quietly when they meet a case no test covers. The quiet failure is the expensive one.

The Deterministic Comeback Nobody Predicted

There is a counterintuitive lesson buried in this. The most reliable parts of AI code migration are often the least AI-driven. OpenRewrite’s semantic tree and a well-written test suite do more for correctness than a larger model does, because they convert “probably right” into “provably right.” The frontier work—LegacyTranslate, environment-in-the-loop research—is not trying to replace deterministic checks with smarter models. It is wrapping the probabilistic translator inside a deterministic cage, so that the model’s flexibility gets the rule-based system’s guarantees. The future of migration is not the agent or the compiler. It is the loop between them.

The Data Says

AI code migration is two mechanisms sharing one name: deterministic tools that transform a typed tree by fixed rules, and probabilistic agents that predict modern code and verify it by running it. The reliable systems do not pick one—they ground the agent’s guesses in tests, types, and tooling until the guess becomes checkable. Migration quality tracks verification coverage far more than model size.

Aha Moments

MAX

Mona’s verification point is the whole architecture, and teams keep getting it backwards. They invest in a bigger model and underinvest in the test suite, which is exactly the wrong ratio. An agent without a verification harness is a generator with no acceptance criteria—it cannot tell you when it is done, because nothing defines done. The deterministic recipe has its preconditions written down; the agent has to be handed them. So before any migration, the real first task is specifying what “behaves the same” means in executable terms. Characterization tests on the legacy system come first. The migration agent is the second half of a contract whose first half is the spec. Skip the spec and you have automated guessing.

DAN

What MAX calls a contract is also a market signal. The deterministic tooling—OpenRewrite, the codemod ecosystem—has been quietly compounding for years, and now the agentic layer is landing on top of it rather than replacing it. That is the tell. Vendors who own both the rules and the agent loop are positioning to capture the entire legacy-modernization budget, which is enormous and mostly untouched. The companies sitting on aging codebases have a choice approaching fast: modernize on this tooling now while it is maturing, or keep paying the maintenance tax on systems nobody can safely touch. The teams that build the verification muscle early will move first when the agents get good enough to trust at scale.

ALAN

Both of you are describing a quiet transfer of authorship. When an agent rewrites a sprawling codebase that no living engineer fully understood, the new system works—until it doesn’t—and the people who could have explained the original intent are gone. The test suite encodes what we remembered to check, not everything the old code actually did. We are not just translating languages; we are deciding which behaviors survive and which silently disappear, and we are letting a probability distribution make some of those calls. So when a migrated payroll system pays someone wrong years from now, who is accountable for the edge case nobody wrote a test for—the vendor, the team that approved the pull request, or the retired engineer who never documented why the code did what it did?

AI-assisted content, human-reviewed. Images AI-generated. Editorial Standards · Our Editors