Skip to content
Agentic Control Plane

What Is an Agent Harness? (And Why Every Harness Needs a Control Plane)

David Crowe · 6 min read
harness agents governance architecture

There’s a line going around that finally names the thing everyone building agents has been circling: Agent = Model + Harness.

The model is the part everyone talks about. The harness is everything else — and it turns out to be most of the system.

An agent is mostly harness Agent = Model decides + HARNESS — acts, and it's most of the system Loop Tools Context & memory Verification Budgets & limits Logging The model decides what to do. The harness does it — and holds the cost, the risk, and the surprises.

The model is the smallest part

Give a frontier model a prompt and it returns text. That’s not an agent. An agent does things: it reads a ticket, searches the web, queries a database, drafts a reply, sends it — looping, checking its own work, deciding what to do next.

None of that is the model. The model decides; the harness acts. The harness is:

  • the loop that runs the model again and again until the task is done,
  • the tools it can call, and how their results are fed back in,
  • the context and memory — what gets loaded into each turn, and what gets trimmed,
  • the verification — the checks that run before a result reaches a user,
  • the budgets and limits — how much a run may spend, how long it may go,
  • and the logging of everything that happened.

Swap the model for a better one and a bad harness still produces a bad agent. Keep the model and fix the harness, and the same model suddenly ships. Practitioners have a phrase for it: the gap between what a model can do and what you actually see it do is a harness gap. Closing that gap is harness engineering — and it’s become its own discipline, with Thoughtworks, Databricks, LangChain, and MongoDB all writing about it.

Harness engineering is where the cost and the surprises live

Once you see agents as harnesses, the weird behavior makes sense.

An agent that costs 1.2¢ one run and 2.2¢ the next isn’t a flaky model — it’s a harness whose loop ran a different number of turns. The bill that balloons on a “simple” task is the harness re-reading its whole context on every turn; on most agents, the majority of the token spend is re-reading, not new work. The run that takes three different paths to the same goal is the harness’s loop diverging. These are harness properties, and they’re measurable: turns, re-read multiple, path variance, cost per step.

That’s the good news about harness engineering — it turns an agent’s behavior into something you can see and tune, instead of a black box that hands back an answer.

But the harness creates problems it can’t solve itself

Here’s the part the harness diagrams leave out. The moment your harness is calling real tools, spending real money, and acting on behalf of real users, you inherit a set of problems that live below the harness — at the boundary where it touches your tools and your backends:

  • Identity. When the harness calls your API, who is it acting as? The user who triggered it? The agent? The framework’s service account? (This is the three-party problem — and most harnesses paper over it with one shared key.)
  • Authority. What is this harness allowed to do? Can it delete, pay, email the whole company? A harness will call any tool you hand it; it has no opinion about which calls are dangerous.
  • Cost. What is it spending — per agent, per tool call — and can you cap it before a runaway loop bills you $400?
  • Control. Can you stop it? Block one caller who’s abusing it? Deny one tool without a redeploy?
  • Evidence. Can you show an auditor every action it took, as a specific user, with the result?

None of these are harness engineering problems. You can build a beautiful harness and still have all five. They’re governance problems, and they’re cross-cutting — they apply to every tool call the harness makes, no matter which framework it’s built in.

The field is starting to draw the line explicitly: observability tells you what the harness did; governance controls what it’s allowed to do. Traces and logs are read-only. Identity, policy, budgets, and blocks are enforcement. A harness needs both — and harness engineering, by itself, gives you neither for free.

Every harness needs a control plane

This is why the harness conversation keeps arriving at the same destination — the AI control plane.

A control plane sits between the harness and the tools it calls, and handles exactly the five things the harness can’t:

  • it gives the agent a verified identity on every call,
  • it enforces policy — allow, deny, or require approval, per tool and per action,
  • it prices every call, so you see the loop tax and can cap it,
  • it gives you a kill switch — for the whole agent, or for one caller,
  • and it writes the audit trail — every action, attributed, with its result.

The harness makes your agent capable. The control plane makes it governable, priced, and safe — down to each individual tool call. And because it lives at the tool-call boundary, it works across any harness: Claude Code, LangGraph, CrewAI, the Anthropic SDK, or one you wrote yourself.

You’re going to engineer a harness — everyone shipping a real agent is. The question was never whether you have a harness. It’s whether the harness has a control plane underneath it.

Call it what you like — a control plane, a governance layer, an agent gateway. The name matters less than the principle: a harness in production needs one. You engineer the harness for capability; you put a control plane under it for identity, cost, and control. Both, or you’re running half a system.

(Full disclosure: we build one — the Agentic Control Plane — which is why we’ve spent so long staring at harnesses. But the argument stands whatever you run.)

Get the next post
What your agents actually cost, how to control every tool call, and the occasional incident post-mortem. One email per post — no marketing fluff.
Share: Twitter LinkedIn
Related posts

← back to blog