Recommended governance deployment patterns — pick the one that scores highest for your stack
tl;dr
After benchmarking seven AI agent frameworks against the same governance backend, one finding stands out:
The integration pattern matters more than the framework brand. Two CrewAI deployments and two LangGraph deployments score the same 40/48. An Anthropic Agent SDK deployment scores 46/48 — six scenarios of additional enforcement, just from a better-shaped wrapper.
This post is the customer-facing recommendation that follows: if you’re picking how to deploy a governed agent, here’s what the data says you should choose.
The recommendation table
| You’re building | Pick | Score | Why |
|---|---|---|---|
| Single-agent Claude tool-use loop in TypeScript | Anthropic Agent SDK + governHandlers |
46/48 | Native dispatch boundary, both fail modes honored |
| Multi-agent system with OpenAI/compatible models | OpenAI Agents SDK + ACP base_url swap | 45/48 | Proxy sits at request serialization boundary |
| Code-editing agent in your terminal | Claude Code or Codex CLI + ACP hook | 43/48 | Hook payload carries chain context |
| Multi-agent framework in Python (CrewAI/LangGraph) | Decorator (@governed) |
40/48 | Will rise to ~44 when SDK 0.2.0 ships chain context propagation |
| IDE-driven agent (Cursor) | MCP integration + server-side mitigations | 37/48 | MCP can’t reach internal IDE tools — structural ceiling |
Underlying score data: /benchmark and the full scorecard post.
Pick by what you’re trying to govern
Governance over external API calls (Slack, GitHub, Stripe, your own backend, customer data) — every pattern here covers this well. Pick on UX preference. The 40-46 range applies to the categories that matter for external-tool governance.
Governance over agent delegation chains (multi-agent systems where one agent spawns another) — proxy or hook pattern wins. Anthropic Agent SDK + governHandlers (46) or OpenAI Agents SDK + base_url swap (45) capture chain context cleanly. Decorator pattern is closing the gap in 0.2.0.
Governance over code editing / file ops in an IDE — none of the integration patterns fully reach here. IDE tools dispatch through the IDE’s engine without serializing through any protocol an external governance layer can intercept. You need server-side mitigations (git hooks, branch protection, CI policies, network-layer enforcement) for these. ACP can’t help directly.
Governance over LLM cost / token usage — orthogonal. ACP governs tool calls and actions, not LLM spend. Pair with Portkey, LiteLLM, or your provider’s per-key budgets for cost attribution.
Pick by the trust profile of who’s using it
Internal team, sophisticated users — any pattern is fine. Hook patterns (Claude Code, Codex CLI) give the smoothest interactive UX. Trust-but-verify model.
Internal team, non-technical users — proxy or TS handler-wrapper. The user can’t accidentally bypass network-layer governance the way they could disable a hook.
External customers in a multi-tenant SaaS — proxy pattern (45). Network-layer enforcement is hardest for customers to bypass, and per-agent attribution via headers is built for multi-tenancy.
Compliance-heavy environments (regulated industries) — proxy or TS handler-wrapper. Both score 45-46 and align with audit completeness, identity propagation, and per-user policy enforcement at near-ceiling. The compliance auditor citation story is cleanest with these.
Coding agents on developer machines — hook pattern with the --dangerously-skip-permissions mitigations. Combine with shell aliases, MDM/EDR alerts, and audit-silence anomaly detection on the server side.
When to choose Cursor anyway
Cursor + ACP scores 37/48 because of structural reasons — Cursor’s internal tools never touch MCP. But Cursor is still a great choice for many teams if you understand the trade-off:
- The score reflects the IDE primitive operations (Edit, Read, Bash) that bypass MCP. MCP-exposed tools (the external services you connect Cursor to) score the same as proxy-pattern integrations.
- For external-service governance, Cursor + ACP MCP gives you full audit + policy + rate limits.
- For internal-tool governance, you need server-side mitigations regardless of which IDE you pick — this isn’t unique to Cursor.
So: pick Cursor if you love the IDE, accept the 37/48 with eyes open, and add network-layer / git-layer policies for what MCP can’t reach. The same caveat applies to any IDE that has internal tool primitives outside an external dispatch protocol.
Migration path if you’re already deployed
If you’re running:
- CrewAI in production → stay. The 40/48 today rises to ~44 in
acp-crewai@0.2.0(shipping soon). For now: prefer Sequential to Hierarchical Process, avoid checkpoint-heavy patterns. - LangGraph with checkpoint-heavy state → review the StateGraph governance gap post. Workarounds exist; SDK fix in 0.2.0.
- Claude Code with
--dangerously-skip-permissionsuse → ship the shell alias mitigation today. Anomaly alerting in the dashboard ships in two weeks. - Cursor with ambient tool access → audit which tools are MCP vs internal. Add server-side enforcement for the internal-tool surface. ACP’s MCP server only governs what MCP carries.
What this whole exercise tells us
Pattern shape determines the governance ceiling. Same gateway, seven frameworks, four patterns, four score tiers (37, 40, 43, 45-46). If governance score matters for your deployment, pick the pattern that scores highest for your shape of agent — single-agent loops use TS handler-wrapper, multi-agent systems use proxy, CLIs use hooks, IDE tools use MCP + server-side mitigations.
The framework is downstream of the pattern. Pick the pattern first.
Receipts:
- /benchmark — live scorecard
- Full scorecard post — every number with sources
- Decorator vs proxy vs hook — pattern theory
- What the benchmark told us about our own product — the roadmap that follows from these scores
- 1. How we think about testing AI agent governance
- 2. CrewAI scores 13/48 on AgentGovBench. With ACP, 40/48.
- 3. CrewAI's task handoffs lose the audit trail — here's the gap and the fix
- 4. LangGraph scores 13/48 on AgentGovBench. With ACP, 40/48.
- 5. LangGraph's StateGraph checkpoints don't replay through governance
- 6. Claude Code scores 13/48 on AgentGovBench. With ACP, 43/48.
- 7. Claude Code's --dangerously-skip-permissions disables every governance hook
- 8. Decorator, proxy, hook — three patterns for agent governance, three different scorecards
- 9. OpenAI Agents SDK scores 13/48 on AgentGovBench. With ACP, 45/48.
- 10. Anthropic Agent SDK scores 13/48 on AgentGovBench. With ACP, 46/48 — best of any framework.
- 11. Codex CLI scores 13/48 on AgentGovBench. With ACP, 43/48 — same as Claude Code.
- 12. Full scorecard: seven frameworks, 48 scenarios, one open benchmark
- 13. How AgentGovBench's 48 scenarios map to NIST AI RMF 1.0
- 14. Reproduce AgentGovBench on your stack — full setup guide
- 15. Cursor scores 13/48 on AgentGovBench. With ACP MCP server, 37/48 — and that gap is structural.
- 16. Recommended governance deployment patterns — pick the one that scores highest for your stack · you are here
- 17. What our benchmark told us about our own product — six fixes we're shipping