Field notes · 2026-05-23 · 8 min read

Claude Code for operations teams, yes, the same tool the engineers use

Claude Code is marketed at engineers, but the agent loop underneath it is general-purpose. Here is how we deploy it as an ops runtime for non-engineering teams.

Anthropic ships Claude Code as a CLI for engineers, and most of the public conversation around it is about generating React components or debugging Python. We have been deploying it for something different. Inside our portfolio companies and a few external engagements, Claude Code is quietly becoming the agent runtime for non-engineering teams. Dispatchers, billing clerks, intake coordinators, compliance leads. People who do not write code but who do a lot of repetitive, judgment-shaped work that an agent can carry.

This is not the pitch you will see on Anthropic's marketing page. It is the version we wish someone had handed us six months ago.

What Claude Code actually is, underneath the developer framing

Strip away the IDE integration and the git commands and what you are left with is a fairly clean agent loop. A long-running session, a model in the loop (Claude Opus, Sonnet, or Haiku), a tool list, a permission system, and a memory file. It can read files, run shell commands, call MCP servers, search the web, and call other agents as sub-tasks. It writes its progress to disk as it goes. It pauses for confirmation on anything risky.

Read that paragraph again and substitute "tickets" for "files," "internal API" for "shell," and "vendor portal" for "MCP server." What you are looking at is the shape of a production back-office agent. The engineering UI is a top-layer detail. The agent runtime underneath is general-purpose.

We did not invent this observation, the Anthropic team designed it this way on purpose. But the operational implication has not landed yet in most businesses we walk into. The same tool the engineering team uses to refactor a service can be configured as the agent runtime for a non-engineering team, with the right scaffolding around it. That is part of our agent orchestration methodology when we start a new engagement.

When this is the right call (and when it isn't)

We are not arguing every ops team should be living in a terminal. Claude Code is a good runtime when four conditions are true.

First, the work is file-shaped or document-shaped. Things like reviewing a stack of intake forms, reconciling a CSV against a vendor portal, drafting follow-up emails from a queue of tickets, classifying a folder of broker rate confirmations. The agent can read, the agent can write, and the artifact lives on disk.

Second, the workflow benefits from a memory. Claude Code's CLAUDE.md file is not a gimmick, it is a runbook. Your dispatcher does not have to re-explain the Texas DOT rules every morning. Your billing clerk does not have to restate the appeals letter template every time. The agent reads the runbook on startup and applies it.

Third, the work is iterative and benefits from confirmation. Some ops work is suited to fire-and-forget batch jobs. Most is not. Most ops work is a back-and-forth, the agent does a chunk, the human looks at it, the agent does the next chunk. Claude Code's permission system is designed for this exact loop. Nothing destructive without confirmation, everything reversible by default.

Fourth, the team has someone willing to be the runbook author. This is the part most vendors skip in the sales process. Claude Code is a runtime, not a configured product. The value comes from the runbook, the tool list, and the MCP servers you wire in. Somebody has to write that. It does not have to be an engineer, but it has to be someone with the patience to be specific. In our engagements that person is often the team's most senior operator, not their most senior engineer.

When all four conditions hold, Claude Code outperforms most off-the-shelf workflow tools we have evaluated, including the ones with much glossier marketing. When any of the four is missing, you want a different shape. A simple LLM API call inside an existing app, or a hosted agent like the Anthropic, Google, and OpenAI offerings we have written about in our hosted agents comparison, or honestly just a better-trained human.

A concrete picture, dispatch operations

Howdy Dispatch is one of the platforms we operate, a multi-tenant TMS for small trucking fleets. So dispatch is a shape of work we know cold. Here is what a Claude Code back-office agent looks like for a typical small carrier, the kind we talk to every week. Treat this as illustrative, not as a specific named engagement.

A typical small carrier's dispatcher receives a steady stream of broker rate confirmations as PDFs into a shared inbox. Historically that means a person manually keying lane, mileage, pickup time, delivery time, rate, and commodity into the TMS. It is mind-numbing work that ends in a tired person and a non-trivial typo rate.

The agent runtime is Claude Code, running in a constrained directory on a small server. The runbook is a CLAUDE.md file the dispatcher and the integration partner co-write. The first page is the dispatcher's own description of how they read a rate con. The middle pages are the field mapping, the carrier's hours of service rules, the lanes they refuse, the brokers they trust, and the brokers they only take loads from after a phone call. The last page is the escalation protocol when something does not parse.

The agent reads each new PDF, extracts the fields, cross references the broker against an internal credit history, drafts a confirmation email, and writes the load to a JSON file that the TMS imports. It pauses and asks the dispatcher when anything is ambiguous, when the rate is below the lane benchmark by a meaningful margin, or when the broker is new. The dispatcher's role shifts from data entry to exception handling.

This is not a moonshot. It is a runtime, a runbook, a couple of MCP servers (broker credit, lane benchmark, hours of service rules), and a small permission scope. The build is on the order of weeks, not quarters, and the runbook is owned by the operator, not the engineer.

Why Claude Code specifically, not a custom build

This is the question every operator asks once they see the demo. Why not just build this as a bespoke Python service that calls the Claude API?

Three reasons we keep coming back to Claude Code.

One, it ships with the hard parts. A long-running session with multi-turn memory, tool dispatch, a permission system, a checkpoint and resume model, observability into what the model did and why. We have built those before. They take real time. Anthropic now ships them as a default, and they update them faster than we can.

Two, the runbook stays human-readable. A CLAUDE.md file is markdown. The dispatcher can read it, edit it, and add a new edge case at 4pm on a Friday without filing a ticket. That is a profound shift in who owns the agent's behavior. In a custom build the runbook lives in code and changes through deploys. In Claude Code the runbook is the configuration.

Three, the MCP ecosystem is real. We can wire Claude Code into the carrier's existing TMS, into a Postgres database, into a Stripe account, into a Slack workspace, using off-the-shelf MCP servers. We pay for the integration once, not over and over. That is the version of the agent ecosystem the industry has been promising for two years, and it actually showed up under MCP. We wrote about it in MCP in plain English if you want the protocol-level story.

We are not multi-provider zealots, we are pragmatists. For agent orchestration work where the loop matters more than the model, Claude Code is currently the most mature runtime on the market. For other shapes of work we pick differently. Smile PreVue runs on Vertex AI under a BAA because the constraint is HIPAA, not loop quality. The right answer is the one the constraint demands.

The unsexy parts you should plan for

We have shipped enough of these to know where the work actually is, and it is not in the agent.

Setting up the host. Claude Code runs on a developer machine by default. For an ops use case you want it running on a constrained server with a service account, not on the dispatcher's laptop. We typically stand up a small Linux VM with the agent process, a watch loop on the input directory, and read-only access scoped tightly. Allow a week of plumbing.

Writing the runbook. The first version of the runbook is always wrong. Plan for two weeks of co-writing with the senior operator, plus a tail of edge-case additions for the next two months as real exceptions surface. The runbook is the product. Treat it that way.

Permission tuning. Out of the box Claude Code is conservative, it asks for confirmation on most operations. For an ops use case you will want to expand the auto-approval list deliberately. We do this in small steps and instrument every auto-approved action so we can roll back if a class of decisions starts going sideways.

Observability. You need to know what the agent did, why it did it, and what it skipped. We wire Claude Code's session logs into a small ClickHouse or Postgres table so we can query the last 30 days of agent behavior. Without this the agent is a black box and you cannot improve it.

The handoff. The dispatcher needs to know how to spot a bad day. We build a short morning summary the agent posts to Slack: "Processed 12 rate cons, escalated 2, here are the 2." If the escalation count spikes, the operator knows to look closer. If it drops to zero for three days in a row, the operator knows the agent is being too confident.

None of this is rocket science. All of it is the work. The model itself is the easy part.

How to know if your ops team is a fit

The honest screen is short. If your team has a senior operator who knows the work cold, a backlog of repetitive document-shaped tasks, and a willingness to invest two weeks in writing a runbook, you have a Claude Code shape of problem. If you do not have that operator, the project will fail no matter what model you put behind it.

That last sentence is the one we keep saying out loud and watching land. AI integration in 2026 is not about the model. The models are commodified, all three labs are roughly interchangeable for most ops use cases. The differentiator is whether you have an operator who can write the runbook and an integration partner who can stand up the runtime. That is the whole project.

If you have the operator and you want the integration partner, start a conversation with us. We will tell you in 30 minutes whether your shape of problem fits the Claude Code runtime, and if it does not, we will tell you what we would build instead. Either way you walk away with a clearer picture of where the agent should live.

Claude CodeAnthropicagent orchestrationoperations AIAI for business

Liked this?

Want this built for your team, or want to learn it yourself? Either way, start here.

Start a project →

Learn 1:1 →

Next read →

Anthropic's hosted agents, what they actually are, and when to use them in your business