Skip to content
Agentic Control Plane

How to Govern AI Agent Tool Calls (Before They Run)

David Crowe · 4 min read
governance tool-calling mcp architecture

When you build an agent with LangGraph, CrewAI, AutoGen, or Mastra, the framework gives you a way to gatekeep the server — an API key, maybe an auth middleware on the endpoint. What it doesn’t give you is governance of the thing that actually does the work: the tool call. The agent decides, at runtime, to call crm.lookup(customerId) or db.query(sql) or refund.issue(amount) — and nothing stands between that decision and the effect.

That’s the gap. Authenticating the request tells you a human started a session. It tells you nothing about whether this agent, acting for this user, should be allowed to issue that specific tool call right now. Governing tool calls means putting a deterministic decision — allow, deny, or redact — in front of every invocation, before it runs.

Here’s how to do it properly, and where it’s actually sound.

Why the framework’s auth isn’t tool-call governance

Frameworks authenticate at the edge and then trust everything downstream. Once the agent is running, its tools execute with one shared backend credential, and the framework has no notion of “the intern’s agent may read the CRM but not issue refunds; the CFO’s agent may do both.” Per-user, per-tool policy isn’t a feature any agent framework ships — it punts that to “your application code.”

So the real questions a production agent raises are:

  • Identity: which verified end user is this tool call acting for — not which service account?
  • Authorization: is this user allowed to invoke this tool, with these arguments, at this point in a delegation chain?
  • Cost & rate: is this call within the user’s budget and rate limits?
  • Audit: can you prove, later, exactly which user caused which tool call?

None of those are answerable if your only control is “the endpoint had a valid token.”

Governing the call: a deterministic gate, not a smart guess

The mechanism that works is boring on purpose: a deterministic check evaluated before the tool runs. It is code, not a model — it returns allow / deny / redact based on the verified user’s identity, the policy, scopes, rate, and budget. The agent’s reasoning is irrelevant to it; a deny is a deny no matter how the model talks itself into the call.

There are three places to put that gate, and they trade off coverage:

  1. Wrap the tool — decorate each tool’s handler so the policy check runs first. Complete for the typed tools you wrap; one decorator per agent. This is the usual path for LangGraph/CrewAI/AutoGen/Mastra.
  2. Proxy the model — sit between the agent and the LLM; deny the tool_calls the model emits before they ever reach the agent. Complete over the model’s output, and it’s where per-call cost gets metered.
  3. Route tools through a gateway — the agent calls the tool through a control plane that executes it. The tool’s effect is the call the gateway makes, so mediation is complete — there’s no path around it.

All three are deterministic. The question that decides whether governance actually holds isn’t determinism — it’s completeness of mediation: does the gate sit on every path the call’s effect can take?

The honest part: govern typed tools, not a raw shell

This is the rule that makes tool-call governance sound, and it’s worth stating plainly: a gate is only as complete as the tool surface is narrow.

A typed tool — notion.readPage, jira.createIssue, db.query — has a bounded interface. The gate sees the call, the arguments, the user, and the decision is meaningful. A bash or exec tool has an unbounded interface: one call, arbitrary effects. Governing it from the call alone is theater, because Bash("make build") can read your secrets and ship them out and your gate never saw it. That’s not a tool to govern — it’s a tool to sandbox.

So the precondition for governable agents is a design choice you make as the builder: give the agent the narrowest typed tools that do the job, not a shell and full filesystem access. Least privilege at the tool level isn’t just good security hygiene — it’s the thing that makes governance, cost attribution, and audit possible at all. (More on that in agent access control with scoped tools.)

Where to start

  1. Replace broad tools with narrow ones. Trade run_shell for the three typed tools the agent actually needs. Now every effect flows through a call you can mediate.
  2. Put a deterministic gate in front of each tool — wrap the handler, bind the end user’s identity per request, and let policy return allow/deny/redact before the call runs.
  3. Meter and audit every call — per-user cost, rate, and an identity-attributed record, so “which user did what, and what did it cost” is a query, not a forensic exercise.

That’s tool-call governance: not a smarter rule that out-arguments the model, but a deterministic gate on a narrow surface — the one place “the agent will just reason around it” is dead on arrival. See what an agentic control plane is for how the pieces fit, or wire it into your framework from the integrations.

Get the next post
Agentic governance, AgentGovBench updates, the occasional incident post-mortem. One email per post. No marketing fluff.
Share: Twitter LinkedIn
Related posts

← back to blog