ALAN opinion 10 min read May 25, 2026 Updated July 9, 2026

Who Owns the Bug When AI Rewrites Your Codebase? Accountability in Automated Migration

A code migration diff marked with a question mark, raising accountability and liability when AI rewrites software.

Table of Contents

The Hard Truth

A migration agent rewrites forty thousand lines overnight. The tests pass, the build goes green, and a tired engineer clicks “accept.” Six weeks later, a quiet rounding error in a payments module starts costing a customer real money. Whose name is on that mistake?

We spent a decade learning to trust version control because it answers one question with brutal clarity: who changed this line, and when. Automated migration quietly dissolves that answer. The author of record becomes a model, a recipe, a vendor template — and the human who approved the change barely read the diff. The distance between who wrote the code and who answers for it is widening, and almost nobody is measuring it.

The Commit Nobody Remembers Writing

AI Code Migration is being sold as relief. Anyone who has shepherded a Java 8 service into the modern world knows the tedium it promises to erase — the deprecated APIs, the dependency graphs that fight back, the weeks of mechanical edits that feel beneath a senior engineer’s time. So we delegate. The machine does the work, and we move on to something that feels more like thinking.

But delegation of labor is not the same as delegation of responsibility, and we keep treating them as if they were. When an automated migration introduces a regression that surfaces months later, the question “who is accountable when AI introduces bugs during code migration?” does not have a tidy answer. The commit is signed by a person who did not write it, generated by a tool that does not understand consequences, configured by a vendor who disclaims warranty, trained on code whose authors never consented to the role. Responsibility has been distributed so thinly that it threatens to disappear entirely.

The Case for Letting the Machine Do the Refactor

It would be dishonest to pretend the conventional enthusiasm is foolish. The best of these tools are genuinely careful, and their carefulness deserves a fair hearing.

OpenRewrite does not guess. It parses your source into an Abstract Syntax Tree and applies deterministic recipes — structural transformations that produce the same result every time, the way a compiler does (OpenRewrite Docs). Amazon Q Code Transformation upgrades Java 8 and 11 codebases to 17 or 21, and — this is the part that matters morally — it produces a transformation summary and a file diff that the developer must review before accepting the changes (Amazon Q Developer Docs). Codemod 2.0 pairs deterministic detection engines with language models only for the parts that need judgment, and runs migration campaigns across many repositories at once (Codemod Blog). Moderne pushes the same idea to enterprise scale, transforming thousands of repositories in parallel (Moderne).

The argument writes itself: these systems are conservative, auditable, and explicitly designed to keep a human in the loop. A person still has to look and say yes. So where, exactly, is the problem?

The Assumption Hiding in “Review and Accept”

The problem lives inside the word “review.”

The entire ethical architecture of automated migration rests on one load-bearing assumption: that the human approval is real — that someone with judgment actually read the change and understood its consequences before clicking accept. When a tool generates a clean, plausible diff across three thousand files, that assumption quietly collapses. No one reads three thousand files with the same attention they would give to thirty. The review degrades into a ritual, and the human-in-the-loop becomes a human-shaped rubber stamp. The approval gate stays open; only the judgment behind it disappears.

The failure modes make this worse, not better, because they are so undramatic. framework and version upgrade tools can stop part-way when a build exceeds its time ceiling or a project grows past a size limit, leaving a migration half-finished (AWS re:Post). Nothing crashes loudly. The output still looks like work. And a half-migrated codebase that compiles is far more dangerous than one that refuses to build, because it invites the very approval it does not deserve. We have engineered a situation where the moment of greatest legal and moral exposure — the click that says “this is now mine” — is also the moment we have made hardest to perform honestly.

What the Factory Floor Already Learned

This is not a new problem wearing a new outfit. Industrial and aviation engineers named it decades ago, long before anyone trained a model on GitHub.

They called it the irony of automation: the more reliable a machine becomes, the less its human supervisor practices the skill of catching it when it fails — and so the human is least prepared at the exact moment they are most needed. Autopilot made pilots safer and, in rare cases, less able to take over in a crisis. The assembly line raised output and dulled the inspector’s eye. The lesson was never “automation is bad.” The lesson was that supervision is itself a skill, and that a system which assumes vigilant oversight while actively eroding the conditions for vigilance is a system designed to fail slowly and blame the operator.

Software is now learning this lesson a second time, with the stakes pushed up rather than down. A pilot supervises one aircraft. An engineer who accepts an automated migration is, in a sense, supervising every future user who will run that code. The asymmetry between the ease of approval and the breadth of consequence has never been this severe.

Accountability Cannot Be Refactored Away

Thesis (one sentence, required): When an AI tool rewrites your code, the accountability for what that code does cannot be outsourced along with the labor — it remains, stubbornly, with the humans and institutions that chose to use it.

Regulation is beginning to move in this direction, and the direction is telling. The revised EU Product Liability Directive came into force at the end of 2024, with national transposition due by December 2026, and it explicitly extends to stand-alone software and AI systems (Gibson Dunn). Under that regime, an AI-component provider and the final-product manufacturer can be jointly and severally liable, which means an injured party can pursue either one (Pinsent Masons). Legal analysts reading these developments suggest that “the AI made the mistake” is unlikely to survive as a defense (MBHB) — though it must be said plainly that the courts have not yet settled how responsibility gets divided between tool vendors and the companies that adopt them. This is an emerging and contested terrain, not a closed case.

What the regulatory drift indicates is a moral instinct society has held for a long time: you do not escape responsibility by inserting a machine between yourself and the harm. A company that chooses a migration tool, configures it, and releases its output has made a chain of human decisions. The tool is an instrument of those decisions, not a substitute for them. Distributing authorship across a model and a vendor does not distribute accountability into nothing — it just makes the accounting harder, and tempts everyone to hope the gap goes unnoticed.

The Questions We Owe the Next Maintainer

So what do we actually owe — not to the regulator, but to the engineer who inherits this code in three years?

We owe them provenance. If an automated agent reaches into a repository through something like the Model Context Protocol and reshapes it, the record of what was changed, by which tool, under whose authority, and reviewed by whom should be as durable as the code itself — not a footnote that evaporates when the vendor sunsets a feature. We owe them honest review, which means resisting the seduction of the green checkmark and accepting that some migrations are too large to approve responsibly in one sitting, however much the dashboard encourages it. And we owe them a culture that treats “I accepted what the tool produced” as the beginning of responsibility, not the end of it.

None of this requires rejecting the tools. It requires refusing to let the tools quietly rewrite who answers for the result.

Where This Argument Could Break

I should name where I might be wrong. If provenance and review tooling matures to the point where an automated diff is genuinely easier to audit than a hand-written one — surfacing risk, flagging behavioral changes, making the consequential edits legible — then the vigilance problem could shrink rather than grow. And if deterministic, AST-based transformation proves measurably safer than human refactoring across large samples, then resisting delegation on accountability grounds might one day protect a worse process than the one it replaces. The thesis holds only as long as approval remains a human act performed under human-friendly conditions.

The Question That Remains

We built version control to make authorship undeniable, and now we are building tools that make it ambiguous again. The labor of migration was always the easy part to automate; the answerability was never on offer. So when the next silent regression surfaces months after the green build — who reads the diff that nobody quite remembers approving, and who, when it matters, is willing to put their name beside it?

Sources

Amazon Q Developer Docs: Upgrading Java versions with Amazon Q Developer - Scope of automated Java upgrades and the mandatory developer review step
OpenRewrite Docs: Latest versions of every OpenRewrite module - Deterministic, recipe-based AST transformation
Codemod Blog: Intelligent code modification at scale (Codemod 2.0) - Hybrid deterministic + LLM architecture and migration campaigns
Moderne: Moderne Platform — enterprise code maintenance - Parallel transformation across thousands of repositories
AWS re:Post: Troubleshoot Amazon Q Developer code transformation failures - Known build-time and project-size failure limits
Gibson Dunn: EU Product Liability Directive: Responding to Software, AI and Complex Supply Chains - Revised PLD coverage of stand-alone software and AI
Pinsent Masons: Revised EU product liability regime expands to AI software providers - Joint and several liability allocation
MBHB: Navigating the Legal Landscape of AI-Generated Code: Ownership and Liability Challenges - Legal analysis on liability for AI-generated code

Aha Moments

MONA

Alan frames this as a question of conscience, and the empirical layer underneath supports his worry. Deterministic, recipe-based transformation on a syntax tree is reproducible — the same input yields the same output, which is exactly what makes it auditable. The trouble starts where the language model takes over for ambiguous cases, because there the output is sampled, not derived. The measurable risk is not that these tools are unreliable on average; most are conservative by design. The risk is that a review process scaled to thousands of files cannot maintain constant scrutiny, and human attention degrades predictably under volume. The accountability gap Alan describes is real precisely because the failure distribution is uneven and quiet, not loud and obvious.

MAX

Mona is right that volume breaks attention, and that is exactly where the engineering answer lives. If approval at scale is unreliable, then approval at scale is the wrong unit. Break the migration into reviewable slices, require an explicit record of what changed and why for each one, and make provenance a first-class artifact rather than a log line that disappears. Alan keeps asking whether we should — and that question stays open — but the conditions that make honest review possible are something we can actually build. An accountability gap is not only a moral failure; it is frequently a design failure wearing moral clothing, and the design is fixable before the philosophy is settled.

DAN

Both of you are circling the thing the market will decide anyway. The teams adopting automated migration are not waiting for the liability question to resolve, and the regulatory direction is clear enough that whoever treats provenance and review as a feature will pull ahead of whoever treats them as overhead. There is real value in being the team that can answer “who changed this and who approved it” without flinching — it is becoming a trust signal, not a compliance chore. The tools are getting better fast, and adoption is accelerating regardless of how the courts eventually rule. So here is what I keep coming back to: when accountability becomes a thing buyers ask about, who is positioned to sell it as a strength?

AI-assisted content, human-reviewed. Images AI-generated. Editorial Standards · Our Editors