Skip to content
Agentic Control Plane
Benchmark series · Part 8 of 15
AgentGovBench →

Decorator, proxy, hook — three patterns for agent governance, three different scorecards

David Crowe · · 5 min read
governance architecture decorator proxy hook agentgovbench

tl;dr

We’ve benchmarked five frameworks against the same ACP gateway. They don’t all score the same:

  • Decorator pattern (CrewAI, LangGraph, Anthropic Agent SDK): wraps individual tool functions. Loses framework-internal context across handoffs/state mutations. Score: ~40/48.
  • Hook pattern (Claude Code): the host calls out to a hook on every tool dispatch with rich payload context. Sees more, scores higher: 43/48.
  • Proxy pattern (OpenAI Agents SDK, Aider): the host’s HTTP client is pointed at a governance proxy. Audits at the network layer. Score: ~45/48 — closest to the pure ACP runner because the proxy sits where the call already serializes.

Same governance backend, three different scores. Worth understanding why if you’re picking a framework, picking a governance product, or deciding which integration pattern to ship.

The three patterns

Decorator: wrap each tool

from crewai.tools import tool
from acp_crewai import governed

@tool
@governed("send_email")
def send_email(to: str, subject: str, body: str) -> str:
    return sendmail(to, subject, body)

Used by: CrewAI, LangChain/LangGraph, Anthropic Agent SDK, Pydantic AI (forthcoming), Vercel AI SDK (forthcoming).

What governance sees: the tool function’s name, input arguments, and (after execution) output. Plus whatever the SDK’s set_context / withContext binds at request scope (typically end-user JWT, agent name, agent tier).

What governance misses: anything the framework does between tool calls. Task handoffs in CrewAI, state mutations in LangGraph, supervisor routing decisions, checkpoint replays — all of it. Decorators wrap individual function dispatch; they don’t see the orchestration.

Why this scores 40/48: Categories like audit_completeness, identity_propagation, per_user_policy_enforcement, rate_limit_cascade all clean — the decorator catches every tool call. But delegation_provenance (2/6) and scope_inheritance (4/6) suffer because the decorator doesn’t know which chain it’s in.

Hook: the host calls out to you

# ~/.claude/settings.json
{
  "hooks": {
    "PreToolUse": [{
      "matcher": ".*",
      "hooks": [{ "type": "command", "command": "node ~/.acp/govern.mjs" }]
    }]
  }
}

Used by: Claude Code, Codex CLI (similar PreToolUse semantics).

What governance sees: whatever the host chooses to put in the hook payload. For Claude Code, that’s tool name, input, session ID, working directory, hook event name, agent tier (interactive/subagent/background), permission mode. Critically, the host knows about its own subagents in a way that an SDK decorator never could.

What governance misses: anything the host doesn’t put in the payload. Subagent attribution is partial (named subagent type captured, but the spawning chain depth isn’t).

Why this scores 43/48: the same audit/identity/policy wins as the decorator pattern, plus delegation_provenance (6/6) and scope_inheritance (6/6) clean because the hook payload carries chain context. One scenario lost (fail_open_honored) because Claude Code’s hook is fail-closed by design. Net: +3 over decorators.

Proxy: redirect the host’s HTTP client

from openai import AsyncOpenAI

client = AsyncOpenAI(
    base_url="https://api.agenticcontrolplane.com/v1",
    api_key=os.environ["ACP_API_KEY"],
)

Used by: OpenAI Agents SDK, Aider, anything that speaks the OpenAI chat-completions API. Also: any future MCP server integration.

What governance sees: the full HTTP request the SDK was about to make to the model provider. Tool calls, handoff metadata, system prompt, model config — everything serialized as JSON for the API call.

What governance misses: anything that happens before the request serializes. Guardrails that fire client-side and short-circuit before any HTTP call goes out are invisible to the proxy.

Why this scores ~45/48: the proxy sits at the natural serialization boundary, so it gets the most complete request payload of the three patterns. Plus per-agent attribution via the x-acp-agent-name header that the SDK sets per-agent. Loses on fail_open_honored for the same reason as the hook (fail-closed by design at the proxy layer).

Three patterns, three optimal use cases

You don’t pick the pattern arbitrarily — the framework you’re using usually picks for you. But understanding the trade-offs helps:

Use case Best pattern Why
You’re building agents in Python with CrewAI/LangGraph Decorator Native to the framework’s tool model; minimal code change
You’re a team using Claude Code (or a Code-style host) Hook Host already supports it; richer chain context than decorators
You’re using OpenAI Agents SDK or any OpenAI-compatible client Proxy Zero code change, just base_url swap
You’re shipping a multi-tenant SaaS that uses agents Proxy Network-layer governance is hardest to bypass
You need to govern tools the LLM doesn’t see (cron jobs, etc.) Direct API Skip the SDK layer entirely; call /govern/tool-use yourself

ACP supports all four. Same /govern/tool-use endpoint, same workspace policies, same audit log. The integration pattern is a deployment choice, not a product choice.

What this means for the benchmark numbers

When you look at our /benchmark page and see 40/48 here and 43/48 there, that variance is real and informative:

  • The numbers don’t say “this framework is better.” They say “this integration pattern exposes more context to governance.”
  • The decorator gap is closing. acp-crewai@0.2.0 and acp-langchain@0.2.0 will thread chain context through set_context(agent_chain=...), narrowing the gap to the hook score.
  • The proxy gap is structural. fail_open_honored will probably never pass for proxy or hook patterns because that’s the fail-closed-by-design choice for protecting integrity.

A benchmark that produces flat scores across every runner is suspicious. Variance is signal — it tells you what each pattern actually buys you.

What’s next

OpenAI Agents SDK and Anthropic Agent SDK scorecards landing later today (their runs are in flight). Cursor scorecard tomorrow — that one’s the most interesting because the MCP integration only governs MCP-exposed tools, not Cursor’s internal IDE tools. Real, structural governance gap that no integration pattern fully closes.

After all six framework rows, the big-reveal post lands with the full scorecard, all patterns side by side, and the methodology audit of where the benchmark itself needs to grow.


Receipts:

Share: Twitter LinkedIn
More in AgentGovBench
  1. 1. How we think about testing AI agent governance
  2. 2. CrewAI scores 13/48 on AgentGovBench. With ACP, 40/48.
  3. 3. CrewAI's task handoffs lose the audit trail — here's the gap and the fix
  4. 4. LangGraph scores 13/48 on AgentGovBench. With ACP, 40/48.
  5. 5. LangGraph's StateGraph checkpoints don't replay through governance
  6. 6. Claude Code scores 13/48 on AgentGovBench. With ACP, 43/48.
  7. 7. Claude Code's --dangerously-skip-permissions disables every governance hook
  8. 8. Decorator, proxy, hook — three patterns for agent governance, three different scorecards · you are here
  9. 9. OpenAI Agents SDK scores 13/48 on AgentGovBench. With ACP, 45/48.
  10. 10. Anthropic Agent SDK scores 13/48 on AgentGovBench. With ACP, 46/48 — best of any framework.
  11. 11. Codex CLI scores 13/48 on AgentGovBench. With ACP, 43/48 — same as Claude Code.
  12. 12. Full scorecard: seven frameworks, 48 scenarios, one open benchmark
  13. 13. How AgentGovBench's 48 scenarios map to NIST AI RMF 1.0
  14. 14. Reproduce AgentGovBench on your stack — full setup guide
  15. 15. Cursor scores 13/48 on AgentGovBench. With ACP MCP server, 37/48 — and that gap is structural.
Related posts

← back to blog