ALAN opinion 11 min read May 19, 2026

Whose Code Is It Anyway? Licensing, Surveillance, and Skill Atrophy in AI Code Completion

Hands at a keyboard with faint code from elsewhere drifting across the screen, raising questions of authorship and consent

Table of Contents

The Hard Truth

Imagine the apprentice who never struggles — every line suggested, every problem half-solved before it could become a question. Now ask: who owns the work, who watches the apprentice, and what was the apprentice ever supposed to learn?

Three things happened to AI Code Completion in the past eighteen months, and none of them got the public reckoning they deserved. A class action is testing whether training on licensed open-source code amounts to taking something that wasn’t given. A policy change has quietly turned every free-tier interaction into raw material for the next model. And two careful studies have found that the developers most convinced AI is helping them are sometimes the ones it is hurting most. Each on its own is a debate. Together, they describe a bargain — and most of us did not realize we were negotiating.

The Question Beneath the Productivity Pitch

The conversation around AI code completion is dominated by speed. How many tokens per second, how many keystrokes saved, how many pull requests merged before lunch. These are the metrics the industry has chosen because they are the ones easiest to count. But counting is not the same as understanding, and the numbers we celebrate are not the ones that tell us what we are becoming.

There is a different conversation underneath this one. It is about who wrote the code that taught the model, who is watching the code the model writes for you, and who will know how to write code at all in ten years. These three questions sound separate. They are not. They are the same question, asked from three angles — and the silence around them is not accidental. The speed pitch works precisely because it changes the subject before the harder questions arrive.

The Case the Industry Makes Honestly

It would be a mistake to dismiss the pro-adoption argument as hype. The strongest version is real. AI code completion lowers the threshold for people who were never going to read three hundred pages of framework documentation. It compresses the distance between an idea and a working prototype. It frees senior engineers from boilerplate so they can spend their attention on architecture and judgment. More than nine in ten developers in active repositories now use AI tools, and more than one in five lines of merged code is AI-authored, according to the Faros AI 2026 Engineering Report. That is not a fad. That is infrastructure.

There is moral weight to this case as well. A junior developer in Lagos or Bratislava who can build working software with the same scaffolding as one in San Francisco is a kind of access we did not have before. The democratization argument is not empty — it points at something real about who gets to participate in software work. The question is not whether to reject these benefits. The question is what else came in the same package, unannounced.

Three Cracks Hidden Inside the Same Tool

The first crack is the one that began with a lawsuit. In November 2022, a class action — Doe v. GitHub — accused Copilot of reproducing licensed open-source snippets stripped of their MIT, Apache, and GPL attribution requirements. Most of the original twenty-two claims have been dismissed, but the breach-of-contract and open-source-license-violation claims have survived, and a DMCA appeal is pending before the Ninth Circuit as of May 2026 (BakerHostetler). No final ruling has emerged, and the legal question of whether model training propagates license obligations remains unsettled. What is settled is the moral question buried inside it: did the people whose code taught the model ever consent to teaching it?

The second crack is more recent and harder to see. Effective April 24, 2026, GitHub changed its terms so that interaction data — your inputs, your generated outputs, the surrounding context — from Copilot Free, Pro, and Pro+ accounts trains AI models by default (GitHub Changelog). Opt-out exists, buried in a settings page. Enterprise customers are exempt, which means the people most able to negotiate are exempt, and everyone else is the training set. This is not surveillance in the dramatic sense of a manager watching a screen. It is something quieter and structurally larger: the work itself becomes raw material, and the asymmetry is now baked into the default.

The third crack is the one we are least prepared for. Anthropic’s controlled study of fifty-two mostly junior Python developers found that those given AI assistance scored measurably lower — roughly two letter grades lower — on a follow-up quiz about the library they had just used, with no statistically significant productivity gain to show for it (Anthropic Research). The split inside that result is sharper still: developers who used AI to ask conceptual questions retained the material; those who delegated code generation did not. METR’s separate study of sixteen experienced open-source maintainers found that developers were nineteen percent slower with AI even though they predicted a speedup beforehand and perceived a speedup afterward (METR). The perception gap is the part that should keep us awake. We are not just losing skill. We are losing the ability to notice that we are losing it.

What the Apprenticeship Forgot to Protect

Software engineering was, until very recently, one of the last trades still organized around apprenticeship. The pull request review, the pair programming session, the senior who explains why a pattern is wrong before approving the fix — these were not just process. They were the transmission mechanism of a craft. Every trade that lost its apprenticeship pathway eventually lost something it could not name in time to defend.

The medieval guilds did not collapse because their work was no longer needed. They collapsed because the conditions for transmitting mastery — the years of supervised struggle, the failure that taught what success could not — were disintermediated by systems that produced the output without the formation. We celebrated the output. We did not notice, until much later, that we had stopped producing the kind of person who knew how the output was made.

AI code completion does not eliminate the apprenticeship. It hollows it. The junior still produces working code. The junior just does not know what the code does, why it works, or what to do when it doesn’t. Stack Overflow’s analysis shows employment for developers aged twenty-two to twenty-five down roughly twenty percent since late 2022 — a trend with multiple contested causes, but one whose coincidence with AI adoption is not a footnote we can dismiss.

The Bargain We Did Not Negotiate

Thesis: AI code completion is not just a productivity tool — it is a renegotiation of who owns code, who watches the people writing it, and how the next generation learns to think.

The renegotiation is uncomfortable because we did not gather to negotiate it. The licensing question was decided by litigation rather than by the open-source communities whose labor was at stake. The training-data question was decided by a terms-of-service update rather than by the developers whose work became feedstock. The skill-formation question is being decided by individual choices — accept the suggestion, do not interrogate it — replicated millions of times a day, with no institution charged with noticing the aggregate. Three separate decisions, each made by someone other than the people who will live with the consequences.

Questions Owed to the People Who Will Inherit This

What does informed consent look like when the contribution was made years ago, in a public repository, under a license that predates the technology that consumed it? Who is responsible for ensuring a junior developer learns the skill they appear to be performing? And if the answer to that question is “no one,” is that a fact we accept, or one we are obligated to change?

These are not questions with clean answers, and the temptation to demand clean answers is part of how we got here. The work is to keep asking them while the systems are still young enough to bend.

Where This Argument Is Weakest

The skill-atrophy data sits on small cohorts. The Anthropic and METR studies, while methodologically careful, sample fewer than seventy developers between them — not nearly enough to generalize across teams, languages, or seniority levels. The junior-employment decline overlaps with a macro hiring slowdown and post-ZIRP correction whose effects cannot be disentangled from AI’s. And the licensing argument has not, as of this writing, produced a definitive court ruling. A reasonable observer could read all of this and conclude the alarm is premature. They might be right. The question is whether we want to wait for the data to be incontrovertible before we ask the harder questions, or whether asking them now is the only way to ensure the data we eventually collect measures the right thing.

The Question That Remains

If your work taught the model, if your prompts now feed the next model, and if the people coming up behind you are learning to delegate the very struggle that made you the person worth learning from — what, exactly, did you receive in exchange? The bargain is not yet complete. There is still time to ask what we are willing to trade and what we are not. But not, I suspect, for very long.

Sources

BakerHostetler: Doe v. GitHub, Inc. — case tracker - Independent legal tracker for the ongoing Copilot class action, including the pending Ninth Circuit appeal.
GitHub Changelog: Updates to our Privacy Statement and Terms of Service: How we use your data - Official announcement of the April 24, 2026 default opt-in to training on Copilot Free/Pro/Pro+ interaction data.
Anthropic Research: How AI assistance impacts the formation of coding skills - Controlled study of fifty-two engineers showing skill-formation gap when AI is used for code delegation versus conceptual questions.
METR: Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity - Randomized study finding developers nineteen percent slower with AI despite perceiving a speedup.
Faros AI 2026 Engineering Report: The AI Acceleration Whiplash — Ten Takeaways - Industry-wide telemetry on AI adoption rates and downstream review erosion.
Stack Overflow Blog: AI vs Gen Z: How AI has changed the career pathway for junior developers - Analysis of the employment shift for early-career developers since late 2022.

Aha Moments

MONA

Alan names the social weight of a finding I keep returning to from the technical side. Fill-in-the-Middle training and the broader infill objectives that power modern code assistants are mechanically agnostic about provenance — the model learns from whatever tokens are presented, with no internal representation of license, consent, or pedagogical context. That mechanical neutrality is not the same as moral neutrality. When the training data is contested and the inference traces are quietly recycled into future models, the architecture itself becomes the place where governance decisions disappear. Engineers can describe what the model does, line by line, and still be unable to describe whose contributions made it possible. The mechanism is transparent. The chain of consent is not. That asymmetry is what Alan is pointing at, and it is real.

MAX

Mona is right that the mechanism does not encode consent, and Alan is right that the consequences are not theoretical. Where I would push the argument further is at the level of practice. The skill-atrophy finding is not just an ethical concern — it is an engineering risk. Codebases written by teams that cannot independently reconstruct what their own software does become harder to debug, harder to evolve, and harder to secure. The licensing question is similar: a stack whose provenance cannot be traced is a stack you cannot defend in an audit, an incident response, or a contract dispute. The ethical and the operational converge here. Treating any one of these three cracks as a soft concern misreads how brittle the resulting systems will be when something goes wrong.

DAN

Mona and Max have already named the architecture and the operational fragility, and I will not relitigate either. What I will add is the market layer. The companies that move first to publish clear training-data provenance, transparent telemetry policies, and credible junior-development pathways will not just be ethically better positioned — they will be commercially differentiated when enterprise buyers start treating these questions as procurement criteria, which is already happening. The opposite is also true: vendors that treat licensing, data use, and skill-formation as PR problems rather than product problems will eventually meet a customer or a regulator who treats them as the substance they are. So here is the question I will leave on the table: who in your stack today could survive the audit Alan is describing — and who is quietly hoping nobody asks?

AI-assisted content, human-reviewed. Images AI-generated. Editorial Standards · Our Editors