When AI Refactors Code Nobody Reviews: Accountability, Hidden Defects, and Developer Deskilling

Table of Contents
The Hard Truth
A model rewrites a class. A teammate approves the diff with a glance. The build is green. Three months later something breaks at three in the morning — and nobody in the room can explain why the code looked the way it did. Whose fault is that?
The interesting thing about Ai Assisted Refactoring is not what it does to the codebase. It is what it does to the people standing around the codebase. The review chair is still there. The reviewer is increasingly not. And once the human disappears from that loop, a quiet renegotiation begins — about who is responsible when the green build turns out to have been wrong all along.
The Review That Never Happened
For most of software engineering’s short history, refactoring was the moment a senior engineer leaned over a junior’s shoulder and asked the only question that ever really mattered: why this shape and not another? That conversation was where craft was transmitted and where defects died early. It was slow and it was uncomfortable and it was the entire point. What happens when both the refactor and its approval are produced, in seconds, by a system that has no stake in the outcome and no memory of last week’s incident?
The honest answer is that we do not know yet. But the early signals are not flattering. The share of “moved” lines — the canonical signature of a real refactor — fell from 24.1% to 9.5% between 2020 and 2024, while copy/pasted lines rose from 8.3% to 12.3% and duplicated code blocks grew roughly eightfold year-over-year, per GitClear’s 2025 report drawing on 211 million lines of code. The activity we used to call refactoring is collapsing into something else, and the something else looks a lot like accelerated cloning.
What the Productivity Numbers Are Actually Saying
In fairness to the assistants, the case for them is strong, and a serious essay should make it before tearing it apart. Modern coding agents reduce the cost of writing the first draft, free engineers from boilerplate, and democratize techniques that used to live only in the heads of the experienced. AI Code Completion keeps a developer in flow. AI Test Generation makes safety nets cheaper. AI-Assisted Debugging compresses the loop between symptom and hypothesis. These are real goods, defended by reasonable people for reasonable reasons.
The strongest version of the productivity argument goes further: when the assistant handles the mechanical work, the engineer is freed to do the judgment work — the architecture, the trade-offs, the ethical line items that machines cannot weigh. That is a beautiful story. It is the story the industry keeps telling itself. It deserves to be examined on its own terms before being doubted.
The Quiet Theft of the Refactor
Here is where the story starts to fray. The judgment work the productivity argument promises to liberate is precisely the work that judgment is built from. Reading other people’s code, restructuring it, defending the restructure in a review — that is the apprenticeship. Remove it and you keep the title of senior engineer but lose the curriculum that produced one. Developers who relied on AI for code generation scored 17% lower on comprehension tests than peers who used the tools conceptually, per Anthropic’s study (via InfoQ). Employment for developers aged 22 to 25 fell by roughly 20% since late 2022 — a number entangled with broader tech contraction, certainly, but not unrelated to a market that no longer needs the cheap labor that used to do the reading.
Then there is the small matter of whether the productivity is even real. AI tools made 16 experienced open-source developers 19% slower across 246 tasks even as they felt 20% faster — a finding bounded to senior contributors in mature codebases (METR’s 2025 study). The gap between perceived and actual performance is the most dangerous thing in this entire conversation, because perception is what governs whether anyone bothers to look at the diff.
When the Apprentice Was the Whole Curriculum
There is a useful parallel from another craft. The medieval guilds did not exist to certify skill. They existed to transmit it — through a long, slow, supervised relationship in which the apprentice did the unglamorous work and the master watched. When industrial machinery arrived, the apprentice’s job did not vanish so much as it became invisible: the machine did the part the apprentice used to do, and the master no longer had a reason to watch. A generation later, the masters retired, and there was nobody who had served the apprenticeship.
The analogy is imperfect, as analogies are. But the structure is the same. AI Code Review that runs without human review is not a substitute for the master watching. It is the machine doing the apprentice’s work while pretending the watching still happens. AI-assisted participants wrote less secure code and were more confident it was secure across 47 participants and five security tasks (the Stanford study by Perry and colleagues). The confidence is the tell. It is what happens when the review chair is empty and the diff is green.
Accountability Cannot Be Outsourced
Thesis: Unreviewed AI refactoring shifts accountability from a named human reviewer to nobody in particular — and the legal, ethical, and engineering systems we have built all depend on that name existing.
When a vulnerability lands in code that no engineer read closely, who answers for it? The engineer who accepted the suggestion at a Copilot acceptance rate near 33%, knowing erroneous automated advice is followed at a 26% higher rate when reviewers are inexperienced (the Predicting Acceptance paper)? The team lead who approved the merge? The vendor whose model produced the diff? Modern product liability and professional duty doctrines assume a chain of human judgment behind the artifact. AI-assisted refactoring does not break that chain so much as it lengthens it until every link is light enough to be plausibly denied. The EU AI Act activates its high-risk framework on August 2, 2026, with penalties up to €15 million or 3% of global turnover — though most AI-assisted refactoring will not be classified high-risk under Annex III, per the European Commission. The accountability argument cannot lean on regulation. It has to rest on professional duty — on what an engineer owes to the people who will use what they release.
What We Owe the Code We No Longer Read
The reflective move here is not to ban the tools. That is neither possible nor desirable. It is to ask what practices preserve the human in a loop where the machine is faster, cheaper, and apparently confident. A vocabulary for this already exists — Govern, Map, Measure, Manage — without prescribing the answer (NIST’s AI Risk Management Framework). The interesting question is local. What does your team do, this quarter, to make sure that someone in the room can still defend the shape of the code? Is there a review that actually reviews? Is there a junior who is allowed to read and ask why? Is there a record of which decisions were made by whom, when the diff lands at three in the morning and the post-mortem starts? Or has the human review been quietly retired, with nobody quite remembering when?
Where This Argument Is Weakest
The hard counter to all of this is that the productivity gains may eventually outweigh the comprehension losses, and the next generation of engineers will simply learn differently — at a higher level of abstraction, the way that today’s web developers do not write assembly. If that turns out to be true, the deskilling story is a transitional anxiety, not a permanent harm. The METR finding is also bounded: it studied senior contributors in mature open-source codebases, not greenfield work where the productivity case is strongest. The argument here would be wrong if AI-assisted refactoring matures into a tool that surfaces its own uncertainty, invites human review on the diffs that matter, and trains rather than replaces the people who use it. That is a possible future. It is not the one currently being built.
The Question That Remains
When the review chair is empty and the build is green and the diff is forgotten by Friday, the code still exists. It runs in hospitals and banks and the small civic systems that quietly hold a society together. Who, in that arrangement, is the one accountable for what it does — and would that person, if named, recognize themselves in the description?
Disclaimer
This article is for educational purposes only and does not constitute professional advice. Consult qualified professionals for decisions in your specific situation.
AI-assisted content, human-reviewed. Images AI-generated. Editorial Standards · Our Editors