What is an Agentic Control Plane?

An Agentic Control Plane is a governance layer that sits between AI coding agents and their tools. It logs every tool call, enforces permissions and policies, and provides audit trails — giving teams visibility and control over what their AI agents are doing.

How does ACP work with Claude Code?

ACP installs a PreToolUse hook in Claude Code that fires before every tool call (Bash, Read, Write, Edit, WebFetch). The hook sends the tool name and input to ACP's governance API for policy evaluation and audit logging. Round-trip is approximately 200ms.

How do I install ACP?

Run one command: curl -sf https://agenticcontrolplane.com/install.sh | bash. This auto-detects Claude Code and OpenClaw, installs the appropriate hooks, and opens your browser to set up your workspace.

Security series · Part 4 of 6

MCP Security →

MCP Servers Proxy Paid APIs With No Rate Limits. I Calculated the Blast Radius.

David Crowe · 6 min read

security-research mcp

An agent gets stuck in a retry loop. It calls an MCP server that proxies OpenAI’s API. Each call costs $0.03. The agent retries 10 times per second. There’s no rate limit on the MCP server.

$18 per minute. $1,080 per hour.

Nobody notices until the invoice arrives.

I scanned 8,216 MCP servers to find how many proxy paid APIs and how many of those document rate limits. The numbers are bad.

How I found them

I identified servers proxying paid APIs by scanning environment variable names. If a server requires OPENAI_API_KEY, ANTHROPIC_API_KEY, or STRIPE_SECRET_KEY as an environment variable, it’s making API calls to that service.

This is a conservative, high-confidence approach. Every server flagged actually requires a paid API key to function. I didn’t count servers that merely mention a service in their description — only servers that require an API key as a configured environment variable.

For rate limit documentation, I scanned READMEs for: rate limit, throttle, requests per second/minute/hour, quota, 429, too many requests, concurrency limit.

The findings

180 servers proxy calls to paid APIs. Of those:

Metric	Count	%
Document rate limits	27	15.0%
Document no rate limits	153	85.0%

85% of servers proxying paid APIs have no documented rate limiting.

Which APIs are being proxied

Paid API	MCP Servers	Notes
OpenAI	74	GPT, DALL-E, Whisper — requires OPENAI_API_KEY
Anthropic	74	Claude API — requires ANTHROPIC_API_KEY
GitHub	22	GitHub API (metered for some operations)
AWS	16	S3, Bedrock, Lambda
Google AI	10	Gemini, Vertex AI
Stripe	9	Payment processing
Azure	6	Azure OpenAI, Azure services
HuggingFace	5	Model inference
Slack	2	Messaging API
Pinecone	2	Vector database
Cohere	2	Model inference
Replicate	1	Model inference
Twilio	1	SMS/voice
Datadog	1	Monitoring API

OpenAI and Anthropic dominate at 74 servers each — expected given these are the most common LLM APIs that MCP servers wrap.

Across all servers

Rate limit documentation isn’t just sparse for paid API proxies. It’s sparse everywhere:

Metric	Count	%
All servers with rate limit mention	901	11.0%
All servers without rate limit mention	7,315	89.0%

89% of the entire MCP ecosystem has no documented rate limiting.

The blast radius math

What happens when an agent gets stuck in a retry loop against a server with no rate limits?

The scenario is realistic. LLMs retry. Tool calls fail for transient reasons — network timeouts, temporary API errors, malformed responses. Agent frameworks implement retry logic. Some retry aggressively. Without rate limits on the MCP server, every retry hits the paid API.

Conservative assumptions:

10 calls per second (agent retry rate — some frameworks go higher)
No rate limiting at the MCP server level
No circuit breaker (no automated detection of the loop)

Paid API	Cost per Call	1 Minute	1 Hour
OpenAI	$0.030	$18.00	$1,080
Replicate	$0.020	$12.00	$720
Anthropic	$0.015	$9.00	$540
Google AI	$0.010	$6.00	$360
Azure	$0.010	$6.00	$360
Cohere	$0.010	$6.00	$360
Pinecone	$0.008	$4.80	$288
Twilio	$0.008	$4.50	$270
AWS	$0.005	$3.00	$180
Stripe	$0.003	$1.50	$90

These are per-agent numbers. An enterprise with 50 agents, each capable of hitting this loop independently: multiply by 50.

The cost per call estimates are rough averages. OpenAI’s actual cost depends on the model and token count. But the order of magnitude is right: a retry loop against a paid API proxy, at 10 calls per second, runs up a four-figure hourly bill.

The compounding problem

Rate limits protect against more than runaway costs. They’re the circuit breaker for:

Retry storms — agent gets an error, retries, gets the same error, retries harder. Without rate limits, this loop runs until something external stops it.
Prompt injection amplification — an attacker who can inject a tool call into the agent’s context can trigger repeated expensive API calls. The agent dutifully executes them. The MCP server dutifully proxies them.
Credential abuse — if the API key is compromised (or shared across multiple agents), rate limits are the only thing preventing the key from being used to exhaustion.
Noisy neighbor — in multi-tenant deployments, one agent’s retry loop consumes the shared API quota for everyone.

Traditional API gateways solved all of these problems. Rate limiting, circuit breakers, budget caps, per-client quotas. None of these patterns exist in the MCP ecosystem.

Why servers don’t implement rate limits

Three reasons:

1. The MCP spec doesn’t define rate limiting primitives. There’s no standard way for a server to communicate its rate limits to the client. No Retry-After header equivalent. No quota negotiation. The spec handles tool discovery and invocation, not resource management.

2. Most MCP servers are thin wrappers. A typical MCP server is 200 lines of code that translates tool calls to API calls. Adding rate limiting means adding state management, persistence, configuration. That’s more complexity than the server itself.

3. The server doesn’t know the cost. An MCP server that proxies OpenAI doesn’t know how many tokens each call will use, what model the user configured, or what the per-token price is. It can count calls but not cost.

What rate limiting at the control plane looks like

The fix is the same pattern as auth and audit: enforce at the layer that sees all traffic.

A control plane that sits between agents and MCP servers can:

Per-user budget caps — “User X can spend $50/day on OpenAI calls through MCP.” The control plane tracks cumulative cost and rejects calls that exceed the budget.
Per-server rate limits — “No more than 100 calls/minute to the database server.” Enforced regardless of how many agents are calling it.
Circuit breakers — if a server returns errors for 5 consecutive calls, stop calling it. Don’t let the agent retry forever.
Cost attribution — track cost per user, per agent, per server. The invoice shows where the money went.

This is the limitabl pattern. The MCP server doesn’t need to implement rate limiting. The control plane rate-limits, budgets, and circuit-breaks on its behalf.

The server proxies the API. The control plane makes sure the proxy doesn’t become a firehose.

Methodology

Paid API detection via environment variable name matching. Servers are flagged only if they require a specific paid API key (e.g., OPENAI_API_KEY, ANTHROPIC_API_KEY, STRIPE_SECRET_KEY) as a configured environment variable. This is a conservative, high-confidence approach — it undercounts servers that proxy paid APIs without requiring a dedicated env var, but avoids false positives from description-only mentions.

Rate limit detection via README scanning for rate-limit-related terms.

Limitations:

Cost-per-call estimates are rough averages; actual costs vary by model, token count, and pricing tier
A server could implement rate limiting without documenting it in the README
The 10 calls/second retry rate is conservative; some agent frameworks retry faster

The blast radius calculations are illustrative. The point isn’t the exact dollar amount — it’s that the ecosystem has no standard mechanism for preventing runaway costs, and 85% of paid API proxies don’t implement one independently.

Read the reference architecture → · Get started free →

Share: Twitter LinkedIn