Benchmark series · Part 10 of 11

Does the Anthropic Agent SDK Have Governance?

David Crowe · April 20, 2026 · 4 min read

anthropic anthropic-agent-sdk benchmark governance agentgovbench

tl;dr

Fifth framework. Highest ACP-paired score we’ve seen.

Configuration	Score
Anthropic Agent SDK (no governance wrapper)	13/48 — vanilla floor (fifth framework confirmation)
Anthropic Agent SDK + ACP (`governHandlers`)	46/48 ⭐ best in class

One ahead of OpenAI Agents SDK proxy (45/48). Three ahead of decorator-pattern frameworks (40/48). Same gateway behind all three.

Why higher? The TypeScript governHandlers wrapper sits closer to the request envelope than the Python decorator does. Single-agent Claude tool-use loops are the most scoped pattern; less framework orchestration to lose context across.

AgentGovBench scores how complete an AI agent stack’s auditable logging and evidence framework is — 48 requirements across per-user identity propagation, append-only audit trail, per-tool policy enforcement, output redaction, and fail-mode discipline. The Anthropic Agent SDK ships covering seven of them. ACP-paired covers all but two.

Why Anthropic Agent SDK native scores at the floor

The Anthropic Agent SDK (and the Claude Agent SDK) ship as TypeScript libraries for tool-use loops around Claude. Out of the box you get:

Tool definitions with handler functions
Message lifecycle management
Optional thinking/extended-thinking blocks
Session and state primitives in newer versions

What you don’t get without explicit wiring:

Per-end-user identity propagation (one ANTHROPIC_API_KEY per process)
A workspace policy concept
Per-tool scope enforcement
SIEM-ingestible audit log
Rate-limit cascade discipline at the application layer
Fail-mode discipline for an external governance plane

Score: 13/48. Identical floor to every other framework’s bare default. The pattern holds across CrewAI, LangGraph, Claude Code, OpenAI Agents SDK, Codex CLI, Cursor, and now Anthropic Agent SDK.

What ACP adds

@agenticcontrolplane/governance-anthropic exports two functions:

import Anthropic from "@anthropic-ai/sdk";
import { governHandlers, withContext } from "@agenticcontrolplane/governance-anthropic";

const handlers = governHandlers({
  web_search: async ({ query }) => doSearch(query),
  send_email: async ({ to, subject, body }) => sendMail(to, subject, body),
});

app.post("/run", async (req, res) => {
  const userToken = req.headers.authorization?.replace("Bearer ", "");
  await withContext({ userToken }, async () => {
    // run the Anthropic tool-use loop with the wrapped handlers
    // ...
  });
});

governHandlers wraps every handler in the map. withContext binds the end-user JWT for the duration of the request. Same /govern/tool-use endpoint as every other ACP integration.

Full integration guide →

Per-category breakdown

Category	Native	+ ACP	Note
Audit completeness	1/6	6/6	Every handler invocation logged.
Cross-tenant isolation	4/6	4/6	Two declined (single-tenant deployment mode).
Delegation provenance	0/6	6/6	Best-class — single-agent loop has no orchestration to lose.
Fail-mode discipline	3/6	6/6	Both fail-open and fail-closed honored.
Identity propagation	0/6	6/6	`withContext` binds JWT per-request.
Per-user policy enforcement	1/6	6/6	Allow/deny/redact per call.
Rate-limit cascade	3/6	6/6	Per-user budget enforced.
Scope inheritance	1/6	6/6	Best-class — same root cause as delegation provenance.
Total	13/48	46/48 ⭐

Why this is the highest score

Three structural reasons the Anthropic Agent SDK + ACP wins:

1. Single-agent loop pattern. Most uses of the Anthropic Agent SDK are single-agent — one Claude instance processing tool-use loops. There’s no inter-agent handoff context to lose. CrewAI and LangGraph have orchestration layers (Hierarchical Process, StateGraph supervisors) that the decorator can’t see; Anthropic SDK doesn’t.

2. The TypeScript wrapper sits at the native dispatch boundary. governHandlers wraps the handler map that the Anthropic SDK calls into directly. There’s no SDK abstraction between the wrapper and the actual tool execution.

3. Both fail modes honored. Unlike Claude Code’s fail-closed-only hook, the Anthropic Agent SDK wrapper can implement both fail-open and fail-closed paths because it’s library code, not a CLI’s permission system. Picks up the extra fail_open_honored scenario that hook patterns can’t.

When to pick Anthropic Agent SDK + ACP

If you’re building a single-agent Claude tool-use loop in TypeScript and want governance, this is the integration shape that scores highest. Specifically:

Customer-service bots powered by Claude
Document-processing agents with a few tools
Per-request workflows where one Claude instance handles end-to-end

If you’re building multi-agent systems with delegation, you’ll be in CrewAI or LangGraph territory and the decorator pattern’s 40/48 is the realistic ceiling until SDK 0.2.0 closes the chain-context gap.

What this means for the bigger picture

We’ve now scored five ACP-paired frameworks. The pattern is clear:

Pattern	Score range	Why
Decorator (Python multi-agent)	40/48	Loses framework orchestration context
Hook (CLI)	43/48	Host’s payload carries chain context, fail-closed only
Proxy (HTTP)	45/48	Sits at request serialization boundary
TS handler-map wrapper (single-agent)	46/48	Native dispatch boundary, both fail modes

The variance is real, structural, and consistent across runs. The pattern shape determines the score, not the underlying gateway.

Try it on your stack: Anthropic Agent SDK integration guide → — three lines added to your tool-use loop, first audit row in minutes. Or open the ACP console → and start free.

Receipts:

Share: Twitter LinkedIn