Decorator, proxy, hook — three patterns for agent governance, three different scorecards
tl;dr
We’ve benchmarked five frameworks against the same ACP gateway. They don’t all score the same:
- Decorator pattern (CrewAI, LangGraph, Anthropic Agent SDK): wraps individual tool functions. Loses framework-internal context across handoffs/state mutations. Score: ~40/48.
- Hook pattern (Claude Code): the host calls out to a hook on every tool dispatch with rich payload context. Sees more, scores higher: 43/48.
- Proxy pattern (OpenAI Agents SDK, Aider): the host’s HTTP client is pointed at a governance proxy. Audits at the network layer. Score: ~45/48 — closest to the pure ACP runner because the proxy sits where the call already serializes.
Same governance backend, three different scores. Worth understanding why if you’re picking a framework, picking a governance product, or deciding which integration pattern to ship.
The three patterns
Decorator: wrap each tool
from crewai.tools import tool
from acp_crewai import governed
@tool
@governed("send_email")
def send_email(to: str, subject: str, body: str) -> str:
return sendmail(to, subject, body)
Used by: CrewAI, LangChain/LangGraph, Anthropic Agent SDK, Pydantic AI (forthcoming), Vercel AI SDK (forthcoming).
What governance sees: the tool function’s name, input arguments, and (after execution) output. Plus whatever the SDK’s set_context / withContext binds at request scope (typically end-user JWT, agent name, agent tier).
What governance misses: anything the framework does between tool calls. Task handoffs in CrewAI, state mutations in LangGraph, supervisor routing decisions, checkpoint replays — all of it. Decorators wrap individual function dispatch; they don’t see the orchestration.
Why this scores 40/48: Categories like audit_completeness, identity_propagation, per_user_policy_enforcement, rate_limit_cascade all clean — the decorator catches every tool call. But delegation_provenance (2/6) and scope_inheritance (4/6) suffer because the decorator doesn’t know which chain it’s in.
Hook: the host calls out to you
# ~/.claude/settings.json
{
"hooks": {
"PreToolUse": [{
"matcher": ".*",
"hooks": [{ "type": "command", "command": "node ~/.acp/govern.mjs" }]
}]
}
}
Used by: Claude Code, Codex CLI (similar PreToolUse semantics).
What governance sees: whatever the host chooses to put in the hook payload. For Claude Code, that’s tool name, input, session ID, working directory, hook event name, agent tier (interactive/subagent/background), permission mode. Critically, the host knows about its own subagents in a way that an SDK decorator never could.
What governance misses: anything the host doesn’t put in the payload. Subagent attribution is partial (named subagent type captured, but the spawning chain depth isn’t).
Why this scores 43/48: the same audit/identity/policy wins as the decorator pattern, plus delegation_provenance (6/6) and scope_inheritance (6/6) clean because the hook payload carries chain context. One scenario lost (fail_open_honored) because Claude Code’s hook is fail-closed by design. Net: +3 over decorators.
Proxy: redirect the host’s HTTP client
from openai import AsyncOpenAI
client = AsyncOpenAI(
base_url="https://api.agenticcontrolplane.com/v1",
api_key=os.environ["ACP_API_KEY"],
)
Used by: OpenAI Agents SDK, Aider, anything that speaks the OpenAI chat-completions API. Also: any future MCP server integration.
What governance sees: the full HTTP request the SDK was about to make to the model provider. Tool calls, handoff metadata, system prompt, model config — everything serialized as JSON for the API call.
What governance misses: anything that happens before the request serializes. Guardrails that fire client-side and short-circuit before any HTTP call goes out are invisible to the proxy.
Why this scores ~45/48: the proxy sits at the natural serialization boundary, so it gets the most complete request payload of the three patterns. Plus per-agent attribution via the x-acp-agent-name header that the SDK sets per-agent. Loses on fail_open_honored for the same reason as the hook (fail-closed by design at the proxy layer).
Three patterns, three optimal use cases
You don’t pick the pattern arbitrarily — the framework you’re using usually picks for you. But understanding the trade-offs helps:
| Use case | Best pattern | Why |
|---|---|---|
| You’re building agents in Python with CrewAI/LangGraph | Decorator | Native to the framework’s tool model; minimal code change |
| You’re a team using Claude Code (or a Code-style host) | Hook | Host already supports it; richer chain context than decorators |
| You’re using OpenAI Agents SDK or any OpenAI-compatible client | Proxy | Zero code change, just base_url swap |
| You’re shipping a multi-tenant SaaS that uses agents | Proxy | Network-layer governance is hardest to bypass |
| You need to govern tools the LLM doesn’t see (cron jobs, etc.) | Direct API | Skip the SDK layer entirely; call /govern/tool-use yourself |
ACP supports all four. Same /govern/tool-use endpoint, same workspace policies, same audit log. The integration pattern is a deployment choice, not a product choice.
What this means for the benchmark numbers
When you look at our /benchmark page and see 40/48 here and 43/48 there, that variance is real and informative:
- The numbers don’t say “this framework is better.” They say “this integration pattern exposes more context to governance.”
- The decorator gap is closing.
acp-crewai@0.2.0andacp-langchain@0.2.0will thread chain context throughset_context(agent_chain=...), narrowing the gap to the hook score. - The proxy gap is structural.
fail_open_honoredwill probably never pass for proxy or hook patterns because that’s the fail-closed-by-design choice for protecting integrity.
A benchmark that produces flat scores across every runner is suspicious. Variance is signal — it tells you what each pattern actually buys you.
What’s next
OpenAI Agents SDK and Anthropic Agent SDK scorecards landing later today (their runs are in flight). Cursor scorecard tomorrow — that one’s the most interesting because the MCP integration only governs MCP-exposed tools, not Cursor’s internal IDE tools. Real, structural governance gap that no integration pattern fully closes.
After all six framework rows, the big-reveal post lands with the full scorecard, all patterns side by side, and the methodology audit of where the benchmark itself needs to grow.
Receipts:
- CrewAI scorecard — decorator pattern
- LangGraph scorecard — decorator pattern
- Claude Code scorecard — hook pattern (43/48 best)
- Methodology post
- agentgovbench repo
- 1. How we think about testing AI agent governance
- 2. CrewAI scores 13/48 on AgentGovBench. With ACP, 40/48.
- 3. CrewAI's task handoffs lose the audit trail — here's the gap and the fix
- 4. LangGraph scores 13/48 on AgentGovBench. With ACP, 40/48.
- 5. LangGraph's StateGraph checkpoints don't replay through governance
- 6. Claude Code scores 13/48 on AgentGovBench. With ACP, 43/48.
- 7. Claude Code's --dangerously-skip-permissions disables every governance hook
- 8. Decorator, proxy, hook — three patterns for agent governance, three different scorecards · you are here
- 9. OpenAI Agents SDK scores 13/48 on AgentGovBench. With ACP, 45/48.
- 10. Anthropic Agent SDK scores 13/48 on AgentGovBench. With ACP, 46/48 — best of any framework.
- 11. Codex CLI scores 13/48 on AgentGovBench. With ACP, 43/48 — same as Claude Code.
- 12. Full scorecard: seven frameworks, 48 scenarios, one open benchmark
- 13. How AgentGovBench's 48 scenarios map to NIST AI RMF 1.0
- 14. Reproduce AgentGovBench on your stack — full setup guide
- 15. Cursor scores 13/48 on AgentGovBench. With ACP MCP server, 37/48 — and that gap is structural.