ALAN opinion 10 min read May 16, 2026

Agents That Click for You: The Ethical Risks of Giving AI Control Over Your Browser and Desktop

Hands at a keyboard with translucent automated cursor overlay tracing through open browser tabs

Table of Contents

The Hard Truth

Somewhere between “AI suggests what to type” and “AI types it for you” lies a line that society crossed without a vote. The agent that books your flight is also the agent that approves the terms. Whose hand is on the mouse — and whose conscience is on the click?

You can now hand a sentence to a chatbot and watch it open tabs, fill forms, dismiss cookie banners, and place orders. The mechanics feel like productivity. The implications are something else. We are letting systems act inside the most personal layer of computing — the cursor — before we have agreed on what consent, accountability, or refusal even mean at that layer.

There is a question hovering over every Browser And Computer Use Agents demo that nobody onstage wants to ask plainly: what are the ethical risks of letting AI agents control your computer, and who gets to decide that the risks are acceptable?

The convenience framing answers a different question. It answers “is this useful?” It does not answer “is this legitimate?” Legitimacy in software used to be earned by transparency: source code, settings, an undo button. Agents quietly dissolve that compact. The agent does not click “I agree” on your behalf in a single moment you witness. It clicks “I agree” in a thousand small ways across sessions you barely remember authorizing. The cumulative effect is a delegation of moral authorship — and there is no checkbox in the operating system for that.

What the Convenience Actually Buys

Take the strongest case before challenging it. Browser and computer-use agents are, in many situations, a real gift. They flatten accessibility gaps for users with motor impairments. They compress hours of administrative drudgery into minutes. They let small teams operate at a scale that previously required headcount. They release attention back to the things humans are actually good at — judgment, taste, care. The case for adoption is not a marketing illusion.

And the technology is becoming more capable, not less. Claude Sonnet 4.5 leads the OSWorld real-computer benchmark at 61.4 percent, up from 42.2 percent in the previous generation, per Anthropic. Atlas, Comet, and the Gemini Agent that absorbed Google’s Project Mariner are no longer experiments — they are mass-market browsers. To dismiss them as a fad would be to misread the moment. The question is not whether agents will become normal. They already are. The question is what becomes of human responsibility once the action layer is shared with software that improvises.

The Assumption We Smuggled In

The hidden assumption inside every agent pitch is that delegating an action is morally identical to delegating a query. It is not. When you ask a model to summarize an article, you remain the final actor — you read the output and decide what to do. When you ask an agent to file a complaint, transfer money, or accept a contract, you have outsourced agency itself. The system is not assisting your decision. It is making the decision and presenting you with the receipt.

That distinction is the entire ethical problem in one sentence. The industry has so far chosen to treat both flows as a single category called “AI features,” which means the consent rituals built for a chatbot — a checkbox, a tooltip, a system prompt nobody reads — have been silently extended to systems that hold the cursor. The category collision is convenient for selling product. It is corrosive for accountability.

This shows up at the security layer too. OWASP’s Excessive Agency entry names the problem directly: vulnerabilities arise from “excessive functionality, permissions, or autonomy granted to LLM agents,” with mitigation requiring “human-in-the-loop control to require a human to approve high-impact actions” (OWASP LLM06:2025). The framework exists. The default settings do not yet honour it.

What Bureaucracy Taught Us About Delegation

Philosophers studying twentieth-century bureaucracies noticed something uncomfortable: distributing an action across many small administrative steps does not just speed up work. It dissolves moral ownership. By the time a decision has passed through forms, clerks, queues, and rubber stamps, no single person feels they made it. Responsibility evaporates into the procedure.

Browser agents are bureaucracy compressed into a single product surface. The user states an intent. The model interprets it. A scheduler — sometimes a piece of Workflow Orchestration For AI — chains the steps. A Code Execution Agents runtime may write and run code along the way. A Retrieval Augmented Agents loop may consult external documents to ground the next click, and those documents may themselves have been poisoned. By the time the agent has booked, paid, or replied, the chain of authorship is so distributed that asking “who did this?” becomes a category error. The honest answer is “the system.” And a system, unlike a person, cannot be held to account.

This is not a hypothetical concern. Google Threat Intelligence observed a 32 percent increase in malicious activity using indirect prompt injection between November 2025 and February 2026 (Google Security Blog). The attack surface is no longer the user’s keyboard. It is every webpage the agent reads on the user’s behalf. Anthropic itself frames the residual risk plainly: “A 1% attack success rate — while a significant improvement — still represents meaningful risk. No browser agent is immune to prompt injection” (Anthropic Research). OpenAI is more blunt: prompt injection “is unlikely to ever be fully solved” (OpenAI). When the vendors who profit most from selling agents tell you the foundational defense cannot be guaranteed, the framing of “personal productivity tool” starts to feel like a category mistake of its own.

When the Cursor Moves Without You

Thesis: The ethical danger of browser and computer-use agents is not their spectacular failures but their quiet successes — every smooth action they take rehearses the public into a posture of unexamined delegation that we have no institutional vocabulary to govern.

Failures attract attention; success normalizes. The agent that completes a task without incident teaches the user that supervising it is no longer worthwhile. After enough successful runs, the human review step becomes a polite formality, then a friction to remove, then a feature that gets disabled by default. This is the trajectory of every automation that worked well enough to stop being audited. The risk is not that agents will go rogue. The risk is that we will stop noticing what they decide on our behalf — and stop having opinions about it — long before any institution catches up to ask whether we should have.

Regulators see the shape of the problem. The EU AI Act’s Article 14 requires high-risk systems to provide human oversight “commensurate with the risks, level of autonomy,” and becomes fully applicable on August 2, 2026 (Regulativ.ai). NIST launched its AI Agent Standards Initiative in February 2026, with an AI Agent Interoperability Profile planned for Q4 2026 (Jones Walker AI Law Blog). These are real efforts. They are also pacing far behind the consumer rollout, and they say very little about what the desktop agent on a private laptop is permitted to do at three in the morning when no auditor is watching.

Questions Worth Sitting With

So how do we live with this? Not by refusing the technology — that boat is already in the harbour and unloading. But perhaps by refusing the assumption that convenience is its own justification. A few questions worth carrying:

What does meaningful consent look like when the system you are consenting to will improvise on your behalf in contexts you cannot anticipate? Who is responsible when the agent makes a small mistake repeatedly across thousands of users — the prompt author, the model vendor, the browser, or the user who clicked “enable”? And what kind of public conversation should we be having about the defaults installed on a billion devices, given that those defaults are now the most consequential policy decisions in computing?

There may not be clean answers. There almost certainly is not a single product feature that resolves them. But the absence of an answer is not the same as the absence of a question.

Where This Argument Bends

This argument has a real weak point. If capability scoping, sandboxed runtimes, and human-in-the-loop gates mature faster than mass adoption of fully autonomous agents — which Anthropic’s own guidance, recommending sandboxed environments such as virtual machines and dedicated machines with no access to sensitive data (Anthropic Docs via Kunal Ganglani analysis), gestures toward — then the worry about normalization may simply be premature. If regulators publish enforceable oversight rules before defaults harden, the institutional vocabulary may arrive in time. The thesis would weaken accordingly. I would consider that a good outcome.

The Question That Remains

If the most consequential decisions of the next decade are not the ones we type but the ones we delegate, what does it mean to live a considered life when most of the clicks that shape it were not yours? And who do we become when “convenience” becomes the strongest argument we are willing to make for or against anything?

Disclaimer

This article discusses security risks for educational awareness. Implementation decisions should involve qualified security professionals.

Sources

OWASP LLM06:2025: LLM06:2025 Excessive Agency - Names excessive agency as a top LLM application risk and prescribes human-in-the-loop gates for high-impact actions.
OpenAI: Continuously hardening ChatGPT Atlas against prompt injection attacks - OpenAI’s public position that prompt injection is unlikely ever to be fully solved.
Anthropic Research: Mitigating the risk of prompt injections in browser use - Quantifies residual prompt-injection risk and acknowledges no browser agent is immune.
Google Security Blog: AI threats in the wild: the current state of prompt injections on the web - Reports a 32 percent rise in indirect prompt injection activity between late 2025 and early 2026.
Regulativ.ai: AI Regulations — EU AI Act, ISO 42001, NIST AI RMF - Summarizes EU AI Act human-oversight obligations applicable August 2, 2026.
Jones Walker AI Law Blog: NIST’s AI Agent Standards Initiative - Describes the U.S. NIST agent standards effort launched in February 2026.
Anthropic Docs (via Kunal Ganglani analysis): Claude Computer Use Security Risks (2026 Guide) - Documents Anthropic’s recommendation to run computer-use agents in sandboxed environments.

Aha Moments

MONA

Alan is right that the action layer is different from the query layer, but the technical reason matters: agents do not have a stable mapping between intent and execution. The same prompt rendered against a slightly different page state can produce different click sequences, because the underlying decision is a function of context that drifts as the page changes. That instability is what makes both the convenience and the danger genuine. It is not that the system occasionally fails. It is that the system is improvising every single run, and the visible smoothness is the result of careful averaging over a distribution we cannot inspect. Treat agent runs as samples, not procedures.

MAX

What Mona calls instability is what I would call an unspecified contract. The agent layer has no formal definition of what counts as a high-impact action, no allowlist for what the user is opting into, and no machine-readable audit trail of the decisions taken on their behalf. Alan’s bureaucracy analogy works precisely because each individual step lacks an enforceable contract, so the chain has nothing to check itself against. The fix is not philosophical. It is architectural: scoped capabilities, signed intent statements per session, and a refusal-by-default posture for any action the user did not name. Until those exist as a platform primitive, every rollout is implicitly trusting the model to invent its own boundaries — which is exactly the surface Alan is worried about.

DAN

Mona and Max are right about the mechanics, but the strategic picture is what locks the trajectory in. The browser is the highest-margin distribution surface in software, and agentic action is the feature that decides who owns the next decade of attention. Whoever sets the defaults captures both the workflow and the liability question — and right now no incumbent has any incentive to ship defaults that slow the demo. That is why Alan’s normalization worry is the one to take seriously: by the time the public has language for what they have agreed to, the defaults will already be the world they live in. So the open question is whether anyone in this market has the courage to compete on restraint instead of capability. Who goes first?

AI-assisted content, human-reviewed. Images AI-generated. Editorial Standards · Our Editors