AI Engineering · 2026-06-15 · 7 min read

Agentic Coding in Production: What Actually Changes When Coding Agents Move From Demo to Daily Driver

Agentic coding in production is not autocomplete. Here is how I delegate real dev work to coding agents, what I gate, and why usage-based billing now matters.

There is a quiet line that 2026 crossed, and most teams have not fully noticed it yet. Coding agents stopped being completion tools and became delegation tools.

Completion is what we have had for years. Autocomplete that finishes your line. A chat window that drafts a function you paste in. Helpful, but you are still the one holding the work. Delegation is different. You hand an agent a task with a clear definition of done, and it owns the task end to end, across files, across tools, until it is finished or it hits a wall and tells you so. Industry coverage this year has been framing the shift the same way, from autocomplete to something closer to a teammate that picks up an issue.

I live in this every day. I run a set of nightly automations that research, write, refactor, and ship across four production platforms, with no one at the keyboard. So this is not a forecast. It is the operator's view of what actually changes when you put coding agents in production, what I hand off, what I refuse to hand off, and why a billing change that landed today makes all of it something you have to measure.

Completion was assistance. Delegation is a different job.

When an agent merely completes, you are still doing the engineering. You decide the approach, you sequence the steps, the tool just types faster. The mental model is a very fast pair of hands.

Delegation breaks that model. Once an agent owns a task end to end, your job moves up a level. You are no longer writing the steps. You are defining the task, drawing the boundary around it, and deciding how you will know it worked. That is a different skill from coding, and it is the one that separates people who get value from agents from people who get a pile of confident, wrong output.

The practical consequence is that you architect the work differently. A good delegated task is narrow, has a verifiable outcome, and fails loudly. A bad one is vague, sprawling, and only fails when it reaches production. The tool did not change which one you wrote. You did, before the agent ever ran.

What I actually delegate to coding agents

The honest answer is: the work that is repetitive, scoped, and checkable. A few real categories, kept anonymized.

Research-to-draft pipelines. Agents that go research a topic, dedupe it against what already exists, and produce a structured draft for review the next morning. The output is never published blind. A human gate sits at the end.
Repetitive refactors and codemods. When a pattern needs to change across dozens of files, an agent does the mechanical sweep far more patiently than I will at hour three. The change is uniform and the diff is reviewable, which is exactly what makes it safe to delegate.
Documentation and config sync across repos. Keeping a source of truth current across several projects is the kind of dull, high-discipline chore humans quietly skip. An agent does it every night without getting bored.

The pattern under all three is the same. Narrow scope, a clear definition of done, and a result I can verify without re-doing the work. If I cannot describe how I will check the output in one sentence, it is not ready to delegate.

Notice what is not on that list: the interesting parts. I am not delegating the decision of what to build, the design of how a system fits together, or the judgment call about a tradeoff. I delegate the labor, not the thinking. The agent is exceptional at the patient, mechanical, easy-to-verify work that used to eat my evenings. It is not a substitute for knowing what good looks like. That distinction is the whole reason delegation pays off instead of backfiring. When I keep the thinking and hand off the labor, I get leverage. When people try to hand off the thinking too, they get output that looks finished and is subtly, expensively wrong.

What I refuse to hand off (yet)

Just as important is the list I keep on the other side of the line.

I do not delegate irreversible or strategic decisions. Anything that touches money, customer data, or a production database without a human in the loop stays with a human in the loop. Not because the agent cannot do the mechanical steps, but because the cost of a confident mistake is asymmetric. A wrong refactor I can revert. A wrong write to production data at 3 a.m. I might not even find until a customer does.

Autonomy without a boundary is just a faster way to be wrong. The whole value of delegation is that the agent moves quickly inside a space where being wrong is cheap and recoverable. The skill is knowing where that space ends. Most of the failures I have seen with coding agents are not the model being dumb. They are someone handing the model a task whose blast radius was bigger than they realized.

The harness is what makes it safe

A coding agent in production is only as good as the harness around it. The model is maybe half the system. The other half is the scaffolding that decides what the agent can touch, when a human gets pulled in, and how the work gets checked before anything ships.

In practice that means a few things. Scoped permissions, so an agent that should be writing a draft cannot also push to a remote or delete a file. Review gates at the points where a mistake would be expensive. And, increasingly, the agent checking its own work before it hands anything back, an adversarial pass that tries to find what is wrong with the output rather than just declaring it done.

Reliability is a harness problem before it is a model problem. Two builders can run the exact same model and get wildly different results, because one wrapped it in guardrails and verification and the other pointed it at the codebase and hoped. If you want the longer argument for why the scaffolding matters more than the model, that is the core of my agent orchestration methodology. The short version: a smart agent with no harness is a liability, and a modest agent with a good harness is a workhorse.

Usage-based billing made the cost visible, and that is good

Here is the part that changed today. As of June 15, 2026, Anthropic's Agent SDK documentation describes programmatic Claude usage, the Agent SDK and claude -p, drawing from a separate monthly credit pool, distinct from interactive usage on subscription plans.

That sounds like a billing footnote. It is actually a useful forcing function. The moment your automations draw from their own measurable pool, every nightly agent becomes a line item instead of free background magic. You can finally see what each one costs, which means you can decide whether each one is worth it.

I welcome this, because token discipline was always the right engineering instinct and now it has a price tag attached. An agent that reads only the files it needs beats one drowning in the entire repo, and it was always faster and more accurate. Now it is also cheaper, and the cheaper-and-better cases line up almost perfectly. When the cost is invisible, sloppy context engineering hides. When it is a number on a dashboard, you fix it.

The teams that will do well under usage-based billing are the ones that already treated context as a budget. The teams that will be surprised are the ones who pointed a fleet of agents at everything and never looked at what they were spending.

Where this leaves a small team

Delegation is how one operator ships like a team. That is not a slogan, it is the actual mechanic. Every narrow, verifiable task I can hand to an agent is a task I am not doing by hand, which is how a very small surface area runs four platforms in production.

But the word that carries all the weight is "if." This works if the boundaries are real. If the harness is solid. If you can verify the output and you actually do. Strip those away and you do not have leverage, you have a faster way to generate problems you will discover later. The agents are the cheap part now. The judgment about what to delegate, where to draw the line, and how to check the work is the part that still belongs to you.

If you are trying to put coding agents to work inside a real business and want to do it without the expensive surprises, let's talk through it. I build these on my own products first, so the advice comes from the operator's seat, not a slide.

AI engineeringagentic codingAI development automationClaude Code

Liked this?

Want this built for your team, or want to learn it yourself? Either way, start here.

Start a project →

Learn 1:1 →

Next read →

Hundreds of Parallel Subagents Just Landed in Claude Code. Here Is When I Actually Reach for Them.