Field notes · 2026-05-27 · 6 min read
OpenAI Operator and the rise of browser agents, when computer use is the right tool for your operations team
OpenAI Operator, Anthropic Computer Use, and Google Project Astra all let AI drive a browser. Here is when that actually matters for a business and when a plain API integration is faster, cheaper, and more reliable.
OpenAI Operator is the version of "AI that uses your computer for you" most operators have actually seen demoed. Anthropic Computer Use is the same idea on the Claude side. Google's Project Astra is the multimodal version of it. All three are real, all three work in production for the right shape of workflow, and all three are wrong for most of the use cases buyers ask us about.
Here is when we recommend a browser agent, when we steer clients away from one, and what we have actually shipped.
What "computer use" agents actually do
A computer-use agent is a model that can take a screenshot, decide what to click, click it, type into a field, scroll, navigate, and continue. It is essentially a human at the keyboard, except the human is a model with a goal.
The use case that matters most for a business: legacy systems that do not have an API. The ERP you bought in 2008 with no integration story. The supplier portal that requires a login and a captcha. The state-government website your operations team logs into every Tuesday to check filings. Anything where the answer to "can we automate this" is currently "the only interface is the browser."
For those workflows, a browser agent is genuinely game-changing. We have shipped one for a client whose entire vendor onboarding process required eleven manual logins to eleven different supplier portals, copy-pasting fields between them, and uploading the same PDF each time. A 90-minute weekly task became a 4-minute review of a completed log.
For workflows that have a real API, computer-use is the wrong tool. Always.
Where we recommend a browser agent
Three patterns we look for.
The system has no API and we cannot get one. Legacy ERP, supplier portal, government website, an internal tool with a Win32-style web UI that pre-dates REST. If the vendor will not give us an API and we cannot pay to get one, the browser is the integration surface. Operator (or Computer Use, or Astra) becomes the bridge.
The task is bursty and human-shaped. Something a person does for 20 minutes once a week. The agent does not need to be fast. It does not need to be cheap per call. It needs to be reliable enough that the human only has to review the output, not redo the work. Browser agents work well here because the unit of work is "complete this once" not "complete this 10,000 times."
The data flow is one-way and low-stakes. Pulling data out of a system that will not give us an API. Read-only browse-and-export. Periodic regulatory filings (filed on a date, never updated). Anything where "the agent makes a mistake once a month and a human catches it" is a tolerable failure mode.
For these three patterns, OpenAI Operator is currently the most mature. Anthropic Computer Use is the cleanest API to integrate against if you want full control. Google Project Astra is the multimodal-best (it can reason about diagrams and screenshots together), but it is also the least documented for production deployments.
Where we steer clients away from browser agents
Three reasons we say no.
The system has an API. If your Shopify integration could be a Shopify webhook, do not build it as a browser agent. The API is faster, cheaper, more reliable, and auditable. We have walked away from at least three engagements where the client wanted a "Claude that uses Shopify like a human." The right shape was a backend that uses the Shopify API and a Claude that reasons about the results.
The task is high-volume and time-sensitive. Browser agents are slow. Each click is a screenshot plus a model call. A task that completes in 30 seconds via API takes 2-4 minutes via browser agent. If you are processing thousands of records a day, that math does not work.
The task changes the world. Reading is fine. Writing is risky. An agent that issues refunds, sends emails, posts to social media, files legal documents, transfers money, these need explicit human review, not autonomous execution. We have seen demos where Operator schedules a meeting on someone's behalf. We will not ship that as autonomous. Drafting and surfacing for approval, yes. Autonomous calendaring with real consequences, no.
The risk model on browser agents is "what is the worst thing the agent could do in five clicks." If the answer is "nothing serious," ship it. If the answer is "drain a bank account, send a confidential document to the wrong person, fire an employee," do not.
What we have shipped
Vendor portal data extraction. Eleven supplier portals, each with a login and a slightly different UI for "where is my pending purchase order." The agent logs in, exports the pending POs, normalizes them into a single CSV, and posts the CSV to the client's internal channel. Runs once a day. Replaced about 90 minutes of weekly manual work. Built on OpenAI Operator because the connector pattern for the eleven portals was already in the example library.
State business filings monitoring. A client with operations in twelve states had a paralegal manually checking each state's business filings portal monthly. The agent checks each portal weekly, reports anything new, and surfaces the document URLs to the paralegal. Built on Anthropic Computer Use specifically because the paralegal wanted to review the agent's reasoning trace ("here is why I flagged this filing as relevant") and Claude's tool-call narration is the most legible of the three.
Internal compliance audit prep. An operator with a legacy compliance system in a vendor portal that had no API. The agent navigates to the right report, exports it weekly as a CSV, runs it through a normalization step, and stores the result in the client's data warehouse. Built on Anthropic Computer Use because the orchestration needed sub-agents (one for navigation, one for data validation) and Claude handles that cleanly.
For each of these, the alternative was custom RPA (UiPath-style) at five to ten times the build cost and ongoing maintenance. The browser agent approach is faster to ship, easier to update when the portal UI changes, and the failure mode is "the agent retries and posts a Slack alert if it cannot complete" instead of "the RPA bot silently breaks."
The pattern we always include
Every browser agent we ship has the same four guardrails:
- A defined success criterion. The agent knows when it is done. "Logged in, exported, posted to channel", these are the three states. If any of them fails, the agent stops and alerts a human.
- A capped action budget. The agent cannot take more than N clicks per task. If it exceeds the budget, it stops. This prevents runaway loops.
- A human-readable execution log. Every click, every keystroke, every page navigation is logged with a human-readable description. We can audit any run after the fact.
- An explicit allow-list of domains. The agent can only operate on the specific URLs the task requires. It cannot wander off to a different site. This is non-negotiable.
These four together turn a "magic AI that uses your computer" into a "predictable automation with reasonable failure modes."
Which provider for which build
We pick by what the orchestration looks like:
- OpenAI Operator when the connector pattern is already in the catalog (most common SaaS portals) and the team is on the GPT ecosystem. Fastest to ship for off-the-shelf scenarios.
- Anthropic Computer Use when the agent needs to explain its reasoning, when sub-agent orchestration matters, or when we want the cleanest tool-call API. Default for legal, compliance, and audit workflows.
- Google Project Astra when the agent needs to reason multimodally, screenshots plus PDFs plus database results, all together. Less production-tested than the other two but the multimodal reasoning is the strongest of the three.
What this means for you, today
If you have a workflow that involves a person logging into a system, doing something rote, and exporting a result, that is a candidate for a browser agent. Three to five weeks of build, a few hundred dollars a month to operate, and you free up hours of human attention.
If you have a workflow that is already API-able, do not build a browser agent. Build the API integration.
If you are not sure which one you have, that is what the consultation is for.
Liked this?
Tell us what is broken. We’ll tell you what the first week looks like.
Next read →
AI for law firms, what actually ships and what stays a pilot