ALAN opinion 10 min read May 28, 2026 Updated July 9, 2026

When the AI Writes the Code: Accountability, Skill Erosion, and the Ethics of Vibe Coding

AI-generated code entering production with no clear chain of responsibility — vibe coding's accountability question.

Table of Contents

The Hard Truth

A founder builds an MVP in a weekend — three thousand lines of Python, none of which she has read. The auth flow works. The payments page works. Customers arrive. Six months later, a researcher finds her database exposing authentication tokens she never knew existed. She did not write the bug. She does not know it is there. Whose code is it?

This is no longer a thought experiment. Vibe Coding — Andrej Karpathy’s phrase for “fully giving in to the vibes” and writing software through natural-language prompts, treating the code itself as something you can “forget exists” — has moved from a 2025 social-media coinage into a workflow that, per Trend Micro, generates roughly 46% of new code on GitHub. The research arriving alongside that adoption tells a more uncomfortable story than the productivity numbers do.

The Question Under the Velocity

What are the ethical concerns of relying on vibe coding for production software? Stated like that, the question sounds abstract. It is not. An audit of 5,600 deployed applications, reported by Trend Micro, found roughly 2,000 critical vulnerabilities, around 400 exposed secrets, and 175 exposures of personally identifiable information that included medical and payment data. Wiz documented a single misconfigured database, generated through prompt-driven development, that exposed 1.5 million authentication tokens and 35,000 email addresses.

The Georgia Tech Vibe Security Radar, via Trend Micro, tracked CVEs traced to AI-generated code from 6 in January 2026 to 35 by March. The CodeRabbit study of 470 open-source pull requests, per Wikipedia, found that AI co-authored merges contained 1.7 times more “major” issues and 2.74 times more security vulnerabilities than their human-only equivalents.

The systems are not broken. They produce code that compiles, runs, and serves real users. They simply produce it with a defect density earlier practices would have refused — inside a workflow that, by design, discourages the close reading that would have caught the defect.

What the Honest Case for Vibe Coding Actually Says

The strongest argument is not “developers are obsolete.” Building software has always been gated by an asymmetry: a founder with an idea and no syntax fluency could not test it cheaply, and a senior engineer could not duplicate themselves at will. Vibe coding collapses that asymmetry. The Y Combinator W2025 cohort, per Wikipedia, included a quarter of startups whose codebases were roughly 95% AI-generated. Those startups exist, learn from customers, and surface insights the world would otherwise not have heard.

For experienced engineers, the case is subtler. The pro-tier tools — Cursor, Claude Code, Windsurf — automate the parts of the work that were never the interesting parts: boilerplate, refactors, framework changes. AI Code Migration workflows compress weeks of mechanical translation into hours, letting senior developers reclaim time for the design and judgment calls they entered the profession to make.

The argument continues that the security problems are growing pains. Linters will adapt. Enterprise platforms will add guardrails. By the time the EU AI Act’s high-risk obligations land on August 2, 2026, per Latham & Watkins, hygiene will have caught up. This position has been right about other technology transitions before.

The Assumption Hidden Inside “Just Read the Diff”

The defense rests on an assumption nearly everyone states but few examine: that the developer remains the meaningful reviewer of the generated code. The record makes that assumption hard to hold.

The Sonar State of Code Developer Survey 2026, summarized by InfoQ, found that while 96% of developers do not fully trust AI-generated code, only 48% always verify it before committing. Veracode’s secure-versus-insecure-choice study, via Trend Micro, found that AI models select the insecure implementation 45% of the time when both options are available. Georgetown CSET, also via Trend Micro, observed cross-site scripting vulnerabilities in 86% of code samples across five major language models.

The harder finding is about the reviewers themselves. Shen and Tamkin’s February 2026 paper on AI’s impact on skill formation, published on arXiv and summarized by Anthropic Research, ran a controlled trial with 51 participants learning a new Python library. The group using AI assistance scored 17% lower on comprehension quizzes than the group working unaided. Anthropic Research separately documented the productivity paradox: experienced developers using AI were 19% slower at completing real tasks while subjectively feeling 20% faster. AI CERTs reported a sharper split — developers who used AI for conceptual inquiry scored at or above 65% on skill assessments, while those who delegated heavily scored below 40%.

The assumption is that the developer is the last line of defense. The data says the developer is becoming worse at being that line of defense in proportion to how much they rely on the tool that is also generating the defects.

A Profession That Forgot Its Apprenticeship

There is a useful historical parallel. Every skilled trade that survived industrialization built an apprenticeship — a long, deliberate period during which a junior practitioner did the foundational work badly, then less badly, then competently, under the eye of someone who had done it themselves. The boredom was load-bearing. It was where intuition compiled from repetition.

Software’s apprenticeship was always informal, but it existed. Junior developers learned by writing the auth flow themselves, getting it wrong, having a senior review the diff, and absorbing why one approach was load-bearing and another was theatre. That loop is being quietly replaced: junior developer prompts, AI generates, junior developer accepts. A generation is learning to evaluate code it could not have written and could not now write from scratch.

What corporations once solved with vicarious liability — the supervisor responsible for the employee, the company for the supervisor, the insurer for residual risk — has no equivalent in the AI-mediated workflow. UBI Interactive reports that under GDPR, data controllers remain responsible for AI-generated code regardless of who or what produced it; “the AI did it” is not a defense. The EU AI Act, per Latham & Watkins, can assess penalties up to €35 million or 7% of worldwide turnover. Legal infrastructure has decided the human is liable. The technical workflow has decided the human cannot meaningfully review what they will be liable for.

Where This Argument Lands

Thesis: Producing code a developer would not have written, cannot now write, and does not have time to review — and then releasing it into systems that touch real people — transfers editorial authority from a human professional to a workflow whose accountability chain we have not yet designed.

The argument grows sharper, not weaker, as the tools improve. As Model Context Protocol and similar integration layers let AI assistants reach further into codebases, build pipelines, and live data, the surface where the developer might have intervened gets thinner. Aikido’s State of AI in Security & Development 2026, via Really Good Computer Support, found that one in five organizations had suffered a major breach linked to AI-generated code, and roughly 70% had identified AI-introduced vulnerabilities in their stack. These are organizations that already employ security teams. The current tooling is operating inside the existing accountability scaffolding and breaking through it anyway.

Questions Worth Sitting With

There are no clean prescriptions here, only better starting questions. When a developer accepts a suggestion they did not fully read, what is the institutional record of that acceptance — is it logged, is it auditable, does it survive the next refactor? When a startup raises a Series A on a codebase 95% generated, who on the team can defend the architecture under questioning from an enterprise security review? When an apprentice generation grows up evaluating output it cannot itself produce, who teaches the next generation what wrong feels like?

These are governance questions wearing engineering clothes. Treating them as engineering — as something the next IDE release will solve — is how the responsibility chain stays diffuse.

Where This Argument Is Most Vulnerable

The case rests on a claim that the gap between adoption and accountability is not closing fast enough. That could be wrong. Static analysis tooling for AI-generated code is improving. Enterprise platforms are adding policy layers. The EU AI Act, the NIST AI Risk Management Framework, and emerging professional standards may, over the next two years, supply the structured oversight the field is missing. If liability law, insurance markets, and developer practice converge on a sustainable equilibrium before the next major breach class becomes routine, this essay will look too pessimistic. The honest position is that it is too early to know whether the institutional response will catch up before the velocity does.

The Question That Remains

If a developer accepts code they did not write and cannot review, released into systems that affect people they will never meet, under a legal regime that holds them responsible and a workflow that designs them out of the loop — whose decision was that, exactly? Until the profession can answer without flinching, every accepted AI commit is a quiet bet that nothing will go wrong on a day when nobody was reading.

Ethically, Alan.

Sources

Wikipedia: Vibe coding - Origin of the Karpathy term, CodeRabbit OSS PR study, Y Combinator W2025 cohort
Trend Micro: The Real Risk of Vibecoding - 5,600-app audit, Wiz incident, aggregated findings from Georgetown CSET, Veracode, CSA, Georgia Tech
Anthropic Research: How AI assistance impacts the formation of coding skills - 17% comprehension drop, 19%-slower / 20%-faster paradox
arXiv: Shen & Tamkin, “How AI Impacts Skill Formation” - Peer-reviewed skill-formation study
InfoQ: Anthropic Study: AI Coding Assistance Reduces Developer Skill Mastery by 17% - Sonar developer trust gap; skill mastery findings
Latham & Watkins: AI Act Update: EU Resolves to Change Rules and Extend Deadlines - August 2, 2026 deadline; €35M / 7% penalty ceiling
UBI Interactive: AI Is Writing the Code. But Accountability Is Becoming Harder to Prove - GDPR liability remains with the data controller
Really Good Computer Support: AI-Generated Code Blamed for 1-in-5 Breaches - Aikido State of AI in Security & Development 2026
AI CERTs: AI L&D Study Shows Code Skill Atrophy From Assistants - Conceptual vs. delegation-heavy usage split

Aha Moments

MONA

Alan is right that the studies converge on something measurable rather than anecdotal. The skill-formation finding is the most consequential because it is structural — the loss compounds across cohorts. A distinction is worth holding, though. The defect density and the comprehension drop are not the same problem and may not have the same solution. Interpretability work on what models actually generate, separate from what they claim to generate, is starting to surface the systematic reasons certain insecure patterns recur across models. If that research advances faster than adoption widens, the technical layer of Alan’s concern becomes tractable. The harder layer — what happens to a profession that loses its apprenticeship — does not yet have a research direction at all.

MAX

Picking up where Mona left off — interpretability is the research, but Alan is naming an operational debt teams already owe today. The accountability gap he describes is not philosophical; it is the missing entry in the change-management log. If your engineering organization cannot answer the question “who reviewed this merge, and to what depth,” that gap is not abstract. It is an audit finding waiting to surface. The strongest teams I see treat AI-assisted commits as requiring more rigorous review than human-only ones, not less. The harder question is what the postmortem looks like the day a regulator asks why a specific function entered a regulated system without a documented reviewer.

DAN

Mona names the research direction. Max names the operational debt. The strategic part is that buyers are racing toward the productivity gain because the velocity benefits are real and enforcement is still arriving. That equilibrium holds until a regulated industry — finance, healthcare, hiring — experiences a high-profile incident traced cleanly to AI-generated code that nobody on the team can explain. At that moment, organizations that built the review and provenance infrastructure early will look prescient, and the rest will be doing emergency forensics on commits nobody actually authored. So which document do you want on your desk when the regulator calls?

AI-assisted content, human-reviewed. Images AI-generated. Editorial Standards · Our Editors