Skip to content
Agentic Control Plane
Benchmark series · Part 15 of 17
AgentGovBench →

Cursor scores 13/48 on AgentGovBench. With ACP MCP server, 37/48 — and that gap is structural.

David Crowe · · 4 min read
cursor mcp benchmark governance ide agentgovbench

tl;dr

Seventh framework in the series, lowest ACP-paired score so far — and the most honest about why.

Configuration Score
Cursor (no ACP MCP server) 13/48 — vanilla floor (seventh framework confirmation)
Cursor + ACP MCP server 37/48 — lowest of any ACP-paired runner

The 37/48 isn’t a bug. It’s the structural ceiling of the MCP integration pattern for an IDE that has internal tools alongside MCP tools.

Specifically: Cursor’s Edit, Read, Bash, and Terminal tools are dispatched by Cursor’s own engine. They never touch the MCP layer. An ACP MCP server can only govern MCP-exposed tools. Internal IDE tools remain at vanilla — no audit, no enforcement, just Cursor’s own permission prompts.

This is a real, documented governance gap that no SDK improvement can close. It’s a property of the integration shape, not the gateway.

Why Cursor native scores at the floor

Same story as the previous six frameworks. Cursor without an ACP MCP server connected:

  • Internal tools dispatch through Cursor’s engine with interactive permission prompts
  • MCP server connections work but emit no centralized audit
  • “Always allow” persists across sessions and bypasses prompts
  • No identity envelope on MCP requests by default
  • TTY output captures inputs/outputs (debug, not audit)

Score: 13/48. Same vanilla floor as every other framework’s bare default.

What ACP adds — and what it can’t

https://api.agenticcontrolplane.com/mcp is an MCP server. Add it to Cursor’s MCP server list and every tool exposed through that server gets full ACP governance — pre/post hooks, audit, policy, rate limits.

But: Cursor’s internal Edit, Read, Bash, Terminal tools dispatch through Cursor’s engine without going through any MCP server. They never reach ACP. The runner deliberately routes scenarios targeting these tools to vanilla dispatch (see runners/cursor_acp.pyCURSOR_INTERNAL_TOOLS set) because that’s what actually happens in production Cursor with an ACP MCP server.

This produces the honest 37/48 ceiling. Three categories take a structural hit:

  • delegation_provenance (4/6) — internal-tool calls in chains lose chain context
  • identity_propagation (5/6) — internal-tool calls have no identity envelope
  • per_user_policy_enforcement (5/6) — policy can’t reach internal tools
  • rate_limit_cascade (4/6) — internal-tool calls don’t count against rate budget
  • fail_mode_discipline (3/6) — fail-mode is moot for the unreachable internal-tool path

Per-category breakdown:

Category Native + ACP MCP Note
Audit completeness 1/6 6/6 MCP-exposed tools fully audited.
Cross-tenant isolation 4/6 4/6 Two declined (single-tenant deployment mode).
Delegation provenance 0/6 4/6 MCP tools recorded; internal tools lost.
Fail-mode discipline 3/6 3/6 No improvement — internal-tool path has no governance to fail.
Identity propagation 0/6 5/6 MCP tools carry identity; internal tools don’t.
Per-user policy enforcement 1/6 5/6 Policy enforced on MCP path; internal path bypasses.
Rate-limit cascade 3/6 4/6 MCP tools rate-limited; internal tools don’t count.
Scope inheritance 1/6 6/6 MCP tool scopes work cleanly.
Total 13/48 37/48  

What this means for Cursor in your team

The honest read:

For agents touching mostly external services (databases, APIs, Slack, GitHub via MCP), Cursor + ACP gives you full governance on those. The MCP-exposed surface is well-covered.

For agents that primarily edit code, run shell commands, or read files (i.e. typical IDE usage), the governance you’d want lives outside MCP’s reach. ACP can’t help directly. You’d need:

  1. Server-side mitigations — git hooks, branch protection, CI policies that catch what client-side governance can’t
  2. Network-layer enforcement — block production endpoints from developer machines, require explicit allowlist
  3. Endpoint-level policy — MDM/EDR that observes process invocations, not just MCP traffic

This isn’t an ACP limitation. It’s a property of how Cursor exposes its tool surface — internal tools live in the IDE process, never serializing to a protocol any external governance layer can intercept. Same gap exists for any client-side governance product attempting to govern an IDE’s internal tools.

What we’d need from Cursor

For ACP (or any external governance product) to fully cover Cursor, the IDE itself would need to:

  1. Expose a hook protocol for internal tools (similar to Claude Code’s PreToolUse/PostToolUse)
  2. Or route internal-tool invocations through a configurable MCP-style intercept point
  3. Or provide a structured audit emission API that external listeners can subscribe to

We’ve filed feedback with Cursor on this. As of writing, none of the three exist. We’ll update this post and the score if/when the situation changes.

What’s worth taking away from this whole series

This is the seventh and final per-framework scorecard. Tomorrow’s big reveal post puts all the numbers side by side.

The takeaway across all seven: integration pattern shape determines what governance can see and enforce. Decorator catches individual tools (40). Hook gets host payload (43). Proxy sits at request boundary (45). TS handler-wrapper hits native dispatch (46). MCP-only IDE leaves internal tools ungoverned (37).

Same gateway behind all of them. Same /govern/tool-use endpoint. Different scores. The variance is real and structural, and worth understanding when you’re picking the framework you’ll govern.


Receipts:

Share: Twitter LinkedIn
More in AgentGovBench
  1. 1. How we think about testing AI agent governance
  2. 2. CrewAI scores 13/48 on AgentGovBench. With ACP, 40/48.
  3. 3. CrewAI's task handoffs lose the audit trail — here's the gap and the fix
  4. 4. LangGraph scores 13/48 on AgentGovBench. With ACP, 40/48.
  5. 5. LangGraph's StateGraph checkpoints don't replay through governance
  6. 6. Claude Code scores 13/48 on AgentGovBench. With ACP, 43/48.
  7. 7. Claude Code's --dangerously-skip-permissions disables every governance hook
  8. 8. Decorator, proxy, hook — three patterns for agent governance, three different scorecards
  9. 9. OpenAI Agents SDK scores 13/48 on AgentGovBench. With ACP, 45/48.
  10. 10. Anthropic Agent SDK scores 13/48 on AgentGovBench. With ACP, 46/48 — best of any framework.
  11. 11. Codex CLI scores 13/48 on AgentGovBench. With ACP, 43/48 — same as Claude Code.
  12. 12. Full scorecard: seven frameworks, 48 scenarios, one open benchmark
  13. 13. How AgentGovBench's 48 scenarios map to NIST AI RMF 1.0
  14. 14. Reproduce AgentGovBench on your stack — full setup guide
  15. 15. Cursor scores 13/48 on AgentGovBench. With ACP MCP server, 37/48 — and that gap is structural. · you are here
  16. 16. Recommended governance deployment patterns — pick the one that scores highest for your stack
  17. 17. What our benchmark told us about our own product — six fixes we're shipping
Related posts

← back to blog