Cursor scores 13/48 on AgentGovBench. With ACP MCP server, 37/48 — and that gap is structural.
tl;dr
Seventh framework in the series, lowest ACP-paired score so far — and the most honest about why.
| Configuration | Score |
|---|---|
| Cursor (no ACP MCP server) | 13/48 — vanilla floor (seventh framework confirmation) |
| Cursor + ACP MCP server | 37/48 — lowest of any ACP-paired runner |
The 37/48 isn’t a bug. It’s the structural ceiling of the MCP integration pattern for an IDE that has internal tools alongside MCP tools.
Specifically: Cursor’s Edit, Read, Bash, and Terminal tools are dispatched by Cursor’s own engine. They never touch the MCP layer. An ACP MCP server can only govern MCP-exposed tools. Internal IDE tools remain at vanilla — no audit, no enforcement, just Cursor’s own permission prompts.
This is a real, documented governance gap that no SDK improvement can close. It’s a property of the integration shape, not the gateway.
Why Cursor native scores at the floor
Same story as the previous six frameworks. Cursor without an ACP MCP server connected:
- Internal tools dispatch through Cursor’s engine with interactive permission prompts
- MCP server connections work but emit no centralized audit
- “Always allow” persists across sessions and bypasses prompts
- No identity envelope on MCP requests by default
- TTY output captures inputs/outputs (debug, not audit)
Score: 13/48. Same vanilla floor as every other framework’s bare default.
What ACP adds — and what it can’t
https://api.agenticcontrolplane.com/mcp is an MCP server. Add it to Cursor’s MCP server list and every tool exposed through that server gets full ACP governance — pre/post hooks, audit, policy, rate limits.
But: Cursor’s internal Edit, Read, Bash, Terminal tools dispatch through Cursor’s engine without going through any MCP server. They never reach ACP. The runner deliberately routes scenarios targeting these tools to vanilla dispatch (see runners/cursor_acp.py — CURSOR_INTERNAL_TOOLS set) because that’s what actually happens in production Cursor with an ACP MCP server.
This produces the honest 37/48 ceiling. Three categories take a structural hit:
delegation_provenance(4/6) — internal-tool calls in chains lose chain contextidentity_propagation(5/6) — internal-tool calls have no identity envelopeper_user_policy_enforcement(5/6) — policy can’t reach internal toolsrate_limit_cascade(4/6) — internal-tool calls don’t count against rate budgetfail_mode_discipline(3/6) — fail-mode is moot for the unreachable internal-tool path
Per-category breakdown:
| Category | Native | + ACP MCP | Note |
|---|---|---|---|
| Audit completeness | 1/6 | 6/6 | MCP-exposed tools fully audited. |
| Cross-tenant isolation | 4/6 | 4/6 | Two declined (single-tenant deployment mode). |
| Delegation provenance | 0/6 | 4/6 | MCP tools recorded; internal tools lost. |
| Fail-mode discipline | 3/6 | 3/6 | No improvement — internal-tool path has no governance to fail. |
| Identity propagation | 0/6 | 5/6 | MCP tools carry identity; internal tools don’t. |
| Per-user policy enforcement | 1/6 | 5/6 | Policy enforced on MCP path; internal path bypasses. |
| Rate-limit cascade | 3/6 | 4/6 | MCP tools rate-limited; internal tools don’t count. |
| Scope inheritance | 1/6 | 6/6 | MCP tool scopes work cleanly. |
| Total | 13/48 | 37/48 |
What this means for Cursor in your team
The honest read:
For agents touching mostly external services (databases, APIs, Slack, GitHub via MCP), Cursor + ACP gives you full governance on those. The MCP-exposed surface is well-covered.
For agents that primarily edit code, run shell commands, or read files (i.e. typical IDE usage), the governance you’d want lives outside MCP’s reach. ACP can’t help directly. You’d need:
- Server-side mitigations — git hooks, branch protection, CI policies that catch what client-side governance can’t
- Network-layer enforcement — block production endpoints from developer machines, require explicit allowlist
- Endpoint-level policy — MDM/EDR that observes process invocations, not just MCP traffic
This isn’t an ACP limitation. It’s a property of how Cursor exposes its tool surface — internal tools live in the IDE process, never serializing to a protocol any external governance layer can intercept. Same gap exists for any client-side governance product attempting to govern an IDE’s internal tools.
What we’d need from Cursor
For ACP (or any external governance product) to fully cover Cursor, the IDE itself would need to:
- Expose a hook protocol for internal tools (similar to Claude Code’s PreToolUse/PostToolUse)
- Or route internal-tool invocations through a configurable MCP-style intercept point
- Or provide a structured audit emission API that external listeners can subscribe to
We’ve filed feedback with Cursor on this. As of writing, none of the three exist. We’ll update this post and the score if/when the situation changes.
What’s worth taking away from this whole series
This is the seventh and final per-framework scorecard. Tomorrow’s big reveal post puts all the numbers side by side.
The takeaway across all seven: integration pattern shape determines what governance can see and enforce. Decorator catches individual tools (40). Hook gets host payload (43). Proxy sits at request boundary (45). TS handler-wrapper hits native dispatch (46). MCP-only IDE leaves internal tools ungoverned (37).
Same gateway behind all of them. Same /govern/tool-use endpoint. Different scores. The variance is real and structural, and worth understanding when you’re picking the framework you’ll govern.
Receipts:
- cursor_native runner
- cursor_acp runner — see
CURSOR_INTERNAL_TOOLSfor the documented internal-tool list - Results JSON
- Decorator vs proxy vs hook
- Full scorecard — all seven frameworks, side by side
- 1. How we think about testing AI agent governance
- 2. CrewAI scores 13/48 on AgentGovBench. With ACP, 40/48.
- 3. CrewAI's task handoffs lose the audit trail — here's the gap and the fix
- 4. LangGraph scores 13/48 on AgentGovBench. With ACP, 40/48.
- 5. LangGraph's StateGraph checkpoints don't replay through governance
- 6. Claude Code scores 13/48 on AgentGovBench. With ACP, 43/48.
- 7. Claude Code's --dangerously-skip-permissions disables every governance hook
- 8. Decorator, proxy, hook — three patterns for agent governance, three different scorecards
- 9. OpenAI Agents SDK scores 13/48 on AgentGovBench. With ACP, 45/48.
- 10. Anthropic Agent SDK scores 13/48 on AgentGovBench. With ACP, 46/48 — best of any framework.
- 11. Codex CLI scores 13/48 on AgentGovBench. With ACP, 43/48 — same as Claude Code.
- 12. Full scorecard: seven frameworks, 48 scenarios, one open benchmark
- 13. How AgentGovBench's 48 scenarios map to NIST AI RMF 1.0
- 14. Reproduce AgentGovBench on your stack — full setup guide
- 15. Cursor scores 13/48 on AgentGovBench. With ACP MCP server, 37/48 — and that gap is structural. · you are here
- 16. Recommended governance deployment patterns — pick the one that scores highest for your stack
- 17. What our benchmark told us about our own product — six fixes we're shipping