Agentic Coding
Also known as: AI coding agents, agentic software development, autonomous coding
- Agentic Coding
- Agentic coding is software development where an AI agent drives a plan-write-test-iterate loop across multiple files and tools, calling actions through a structured agent loop while a developer reviews outcomes and approves the merge.
Agentic coding is software development in which an AI agent runs a plan-write-test-iterate loop across files and tools, while the developer supervises decisions and reviews outcomes instead of typing each line by hand.
What It Is
For years, AI assistance in code editors meant autocomplete — the model suggested the next few lines, the developer accepted or rejected, and work continued one keystroke at a time. Agentic coding inverts that arrangement. The AI agent takes a task description, plans the changes, edits multiple files, runs the tests, reads the failures, fixes them, and only hands the result back when the loop produces something that compiles and passes. The developer reviews outcomes rather than authoring each line.
The architecture underneath is simple to describe. According to Braintrust, the canonical agent is a while loop that calls tools, driven by an LLM and a system prompt. The loop has four parts: a model for reasoning, memory for context, a planning step, and tool use for editing files, running commands, or calling APIs. Each turn, the agent decides which tool to call next based on what the previous tool returned. The loop ends when the agent declares the task done or runs out of budget.
What makes the pattern work for software — and not, say, for agentic legal writing — is that the environment already has machine-readable signals for “done.” Compilers compile or do not. Tests pass or do not. The agent runs its actions, reads the structured feedback, and adjusts. Tools like Anthropic’s Claude Code, Cognition’s Devin, OpenAI’s Codex CLI, and the agent modes inside Cursor all run a version of the same loop, differing mostly in how much autonomy they give the agent before checking with the human. According to Awesome Agents leaderboard, Claude Code currently leads SWE-bench Verified at 87.6%, while Devin 2.0 sits at 45.8% on the same benchmark.
How It’s Used in Practice
Most developers meet agentic coding through their existing editor. Open Cursor or VS Code with the Claude Code extension, or run the Claude Code or Codex CLI from a terminal inside the project. Type a task: “Add a rate limiter to the API routes, including tests.” The agent reads the relevant files, writes a short plan, makes the edits, runs the tests, reads the failures, and iterates until the suite is green. The developer reviews the diff and either accepts it, rejects it, or asks for revisions.
The delegate-style flow is the second pattern. Tools like Devin take a brief, spin up their own sandbox, work for minutes or hours without supervision, and present a pull request when finished. The developer reviews the PR in GitHub instead of watching the agent type. Both flows share the same plan-write-test-iterate core; they differ in how much the developer wants to watch.
The connective tissue between an agent and its external tools is increasingly the Model Context Protocol. MCP gives agents a standard way to call databases, GitHub, monitoring services, and file systems without one-off integrations per agent.
Pro Tip: Give the agent a definition of done before it starts — the exact files to touch, the tests that must pass, and the commands it has permission to run. Vague prompts produce vague plans; specific specs produce reliable loops.
When to Use / When Not
| Scenario | Use | Avoid |
|---|---|---|
| Refactoring or renaming across many files | ✅ | |
| Critical security or cryptography code with strict audit needs | ❌ | |
| Adding test coverage to existing modules | ✅ | |
| Greenfield architecture decisions still under debate | ❌ | |
| Framework or library version migrations with mechanical patterns | ✅ | |
| Hot-path performance work where every microsecond matters | ❌ |
Common Misconception
Myth: Agentic coding means the AI writes code unsupervised, and developers shrink into reviewers who just click “merge.” Reality: The agent runs the typing-and-running loop, but the developer still owns scope, acceptance criteria, tool permissions, and the final merge decision. The skill shifts from writing lines to writing the specification the agent has to satisfy.
One Sentence to Remember
Agentic coding does not replace the developer’s judgment — it moves that judgment from inside the editor to the planning conversation that happens before any code gets written.
FAQ
Q: What is the difference between agentic coding and AI autocomplete like GitHub Copilot? A: Autocomplete suggests the next few lines while you type. Agentic coding lets the AI plan changes, edit multiple files, run tests, and iterate until the task is done. The human supervises outcomes, not keystrokes.
Q: Do I need a special IDE or tool to start with agentic coding? A: No. Claude Code and OpenAI’s Codex CLI run in any terminal. Cursor and similar IDEs add an agent mode that wraps your existing editor, so you can start with the tool you already use.
Q: What is MCP and why does it matter for agentic coding? A: Model Context Protocol is an open standard for connecting AI agents to external tools like databases, GitHub, or file systems. According to MCP specification, it uses JSON-RPC 2.0 with primitives for tools, resources, and prompts.
Sources
- Anthropic: Claude Code — Anthropic’s agentic coding system - Vendor reference for the leading agentic coding product.
- MCP specification: Model Context Protocol Specification (2025-11-25) - Open standard for agent-to-tool connections used across agentic coding tools.
Expert Takes
The interesting principle here is the agent loop itself — a while block calling tools, gated by a model deciding what to do next. Not magic. Statistics in a loop, with structured I/O. The reason it works for code specifically is that compilers and test runners give the loop a hard, machine-readable signal for “done.” Most domains lack that signal, which is why coding became the first agent category to scale.
The trick is feeding the agent a real specification, not a prompt. Definition of done. File paths it should touch. Tests that gate the merge. Tools it has permission to call. The agent loop is reliable when the context file is explicit; it spirals when the agent has to guess scope. Treat the spec as the artifact you actually write — the code becomes the by-product of a good spec plus a working loop.
The market has already split into two camps: developer-facing agents you supervise turn by turn, and delegate-style agents you brief and walk away from. Both ship code. Both have paying customers in real engineering shops. The losers are tools stuck in plain autocomplete, wearing an AI sticker but missing the loop. If your editor cannot plan, test, and iterate, it is already behind the curve.
The uncomfortable question is what happens to juniors when the agent does the typing-and-running loop. The senior who already knows when to override the agent gets faster. The junior who never wrote that code by hand may never build the intuition to spot when the agent is wrong. Who is responsible when a confident merge silently breaks production — the engineer who approved the diff, the vendor, or the team that skipped the review?