AI Engineering · 2026-06-12 · 7 min read
Hundreds of Parallel Subagents Just Landed in Claude Code. Here Is When I Actually Reach for Them.
Anthropic shipped Dynamic Workflows and a 1M-token context with Opus 4.8. Here is when fanning out to parallel subagents helps, and when it burns tokens.

In the first week of June 2026, Anthropic released Claude Opus 4.8, which took the top spot on the Artificial Analysis Intelligence Index and ships with a one-million-token default context. In the same drop, they shipped Dynamic Workflows: the ability for Claude Code to orchestrate hundreds of parallel subagents, plus effort control to dial how hard the model works on a task.
That is a real capability jump, and it was not happening in a vacuum. The same window saw heavy agentic-AI emphasis across the labs, from Google's parallel-orchestration messaging to Apple wiring multiple AI options into the iPhone. Fan-out is the industry's current center of gravity.
So here is the operator-honest take, because that is the only kind worth writing. More parallel agents is a capability, not a strategy. The fact that I can now spin up a hundred subagents from a single command does not make my work better. In most cases, used carelessly, it just makes it more expensive and more confusing. The interesting question is not "can I fan out." It is "when should I, and how do I keep it reliable when I do."
What actually shipped, in plain English
Three things landed together, and they reinforce each other.
A one-million-token default context means the model can hold a lot more in working memory at once: a big chunk of a codebase, a long set of results, a sprawling document. Dynamic Workflows means a single run can decompose work and dispatch it across many subagents that run at the same time, instead of one agent grinding through everything in sequence. Effort control means you can tell the system how much compute to spend, so a quick task does not get the same treatment as a hard one.
Why does this matter to anyone building? Because spinning up large-scale parallel work just got cheap and easy. And that is precisely why discipline matters more now, not less. When the expensive, hard-to-set-up thing becomes a one-liner, the failure mode shifts from "I could not do it" to "I did it when I should not have."
What more parallel agents does not fix
Let me start with the disappointment, because it saves the most money.
A bigger fan-out does not make a vague task clearer. If I hand ten agents a fuzzy instruction, I do not get clarity, I get ten confidently different interpretations of the same fog. A bigger fan-out does not make a flaky eval pass either. If my verification is weak, running it in parallel just produces wrong answers faster and in higher volume.
Most agent failures I see in production are not headcount problems. They are context problems and verification problems. The agent did not have the right information, or nothing checked its output before it shipped. Throwing more agents at either of those makes them worse, not better, because now I have more outputs to wrangle and the same broken plumbing underneath.
Capability is not strategy. The teams that get burned by this drop are the ones who treat "hundreds of subagents" as a goal instead of a tool.
When I actually fan out
Fan-out earns its keep on one specific shape of work: tasks that decompose cleanly into independent units. The keyword is independent. If unit A does not need to know what unit B is doing, they can run side by side with no coordination cost.
That covers a lot of real work. Per-file, when I want every file in a directory transformed or audited the same way. Per-project, when I am running the same routine across several separate codebases. Per-source, when I am researching a question and want a different agent reading each source. Per-test, when I want a fleet of attempts at a hard problem, scored independently.
I already lean on this in my own nightly automations. Several of them run one agent per project, each with its own narrow slice of context, doing the same job in its own lane, and then a final pass that reads all the results and synthesizes them. The projects do not depend on each other, so the per-project agents never need to talk. That is the clean case, and it is where parallelism is pure upside: the wall-clock time is the slowest single lane, not the sum of all of them.
The one-million-token context helps here, but in a specific spot. It lets the synthesis agent at the end hold more of the combined results at once, so it can reason across everything instead of summarizing in lossy chunks. It does not replace good decomposition. A bigger context window on a badly split problem is just a bigger room to be confused in.
When fanning out is the wrong move
The mirror image is just as important. Fan-out is the wrong tool when the work has tight cross-dependencies or a shared moving target.
If every unit needs to see what the others are doing, parallel agents do not collaborate, they collide. You get a barrier of agents racing on the same state, re-doing each other's work, and stepping on conclusions. I have watched a naive fan-out on an interdependent task take longer and cost more than a single focused agent would have, because all the saved time got eaten by reconciling the conflicts.
The other case is simpler and more common than people admit: sometimes one good agent with the right context is just cheaper and more reliable. Not every task wants an org chart. If a single agent with a tight, well-built context can do the job, that is usually the right answer, and reaching for a swarm is ego, not engineering.
The discipline that keeps multi-agent work reliable
When I do fan out, a few rules keep it from turning into a token bonfire.
Deterministic control flow around the fan-out. The script decides what runs, the model does not improvise the topology. I want the orchestration, the loops, the conditionals, the what-runs-when, to be deterministic code, with the model doing the judgment inside each well-scoped step. Letting the model invent the whole structure on the fly is where runs go sideways and costs go vertical.
Narrow context per agent beats one agent drowning in everything. Each subagent should get exactly the slice it needs and nothing else. A focused agent with the right 5,000 tokens outperforms one buried in 500,000, even now that the big context is available. The big window is for the synthesis step, not an excuse to stop curating.
A verification or synthesis pass, and adversarial checks for anything that must be right. Parallel work needs a step that pulls it back together and, for anything high-stakes, a step that actively tries to refute the findings. If a result has to be correct, I would rather spend a few extra agents trying to break it than ship a plausible wrong answer at scale.
Token discipline, because parallel is fast, not free. Under usage-based billing, a hundred agents is a hundred agents' worth of tokens whether or not the work needed them. Fan-out trades money for wall-clock time. That is often a great trade, but it is a trade, and I make it on purpose, not by reflex.
What I would build with it this week
If I wanted to put this to work right now, the cleanest first project is a codebase-wide audit or migration. Say I need to apply the same change across a few hundred files, or check every module against the same set of rules. That decomposes perfectly: one agent per file, a deterministic script driving the fan-out, a synthesis pass to collect and dedupe the findings, and an adversarial check on anything the audit flags as critical. That is the shape where hundreds of subagents turn a multi-day slog into an afternoon, and the new context window makes the final roll-up genuinely better.
One last note, since the studio is deliberately multi-provider. This particular drop is Anthropic's, but the discipline is not. The same logic, decompose cleanly, keep control flow deterministic, narrow the context, verify the output, watch the tokens, applies whether you build on Anthropic, Google, or OpenAI. I do not bet a client's system on one lab, and I do not bet a reliability strategy on one vendor's feature either. The capability is theirs to ship. The judgment about when to use it is the part that actually matters, and that part travels.
If you want help figuring out where fan-out genuinely fits in your operation, and where a single sharp agent is the smarter call, that is the kind of thing I work through with teams. Read how I think about it in my agent orchestration methodology, or start a conversation.
Liked this?
Want this built for your team, or want to learn it yourself? Either way, start here.
Next read →
Buying AI One Year at a Time: How I Keep Vendor Lock-In Out of What I Ship