Skip to content
Agentic Control Plane

MCP Servers Proxy Paid APIs With No Rate Limits. I Calculated the Blast Radius.

David Crowe · · 6 min read
security-research mcp

An agent gets stuck in a retry loop. It calls an MCP server that proxies OpenAI’s API. Each call costs $0.03. The agent retries 10 times per second. There’s no rate limit on the MCP server.

$18 per minute. $1,080 per hour.

Nobody notices until the invoice arrives.

I scanned 8,216 MCP servers to find how many proxy paid APIs and how many of those document rate limits. The numbers are bad.


How I found them

I identified servers proxying paid APIs by scanning environment variable names. If a server requires OPENAI_API_KEY, ANTHROPIC_API_KEY, or STRIPE_SECRET_KEY as an environment variable, it’s making API calls to that service.

This is a conservative, high-confidence approach. Every server flagged actually requires a paid API key to function. I didn’t count servers that merely mention a service in their description — only servers that require an API key as a configured environment variable.

For rate limit documentation, I scanned READMEs for: rate limit, throttle, requests per second/minute/hour, quota, 429, too many requests, concurrency limit.

The findings

180 servers proxy calls to paid APIs. Of those:

Metric Count %
Document rate limits 27 15.0%
Document no rate limits 153 85.0%

85% of servers proxying paid APIs have no documented rate limiting.

Which APIs are being proxied

Paid API MCP Servers Notes
OpenAI 74 GPT, DALL-E, Whisper — requires OPENAI_API_KEY
Anthropic 74 Claude API — requires ANTHROPIC_API_KEY
GitHub 22 GitHub API (metered for some operations)
AWS 16 S3, Bedrock, Lambda
Google AI 10 Gemini, Vertex AI
Stripe 9 Payment processing
Azure 6 Azure OpenAI, Azure services
HuggingFace 5 Model inference
Slack 2 Messaging API
Pinecone 2 Vector database
Cohere 2 Model inference
Replicate 1 Model inference
Twilio 1 SMS/voice
Datadog 1 Monitoring API

OpenAI and Anthropic dominate at 74 servers each — expected given these are the most common LLM APIs that MCP servers wrap.

Across all servers

Rate limit documentation isn’t just sparse for paid API proxies. It’s sparse everywhere:

Metric Count %
All servers with rate limit mention 901 11.0%
All servers without rate limit mention 7,315 89.0%

89% of the entire MCP ecosystem has no documented rate limiting.

The blast radius math

What happens when an agent gets stuck in a retry loop against a server with no rate limits?

The scenario is realistic. LLMs retry. Tool calls fail for transient reasons — network timeouts, temporary API errors, malformed responses. Agent frameworks implement retry logic. Some retry aggressively. Without rate limits on the MCP server, every retry hits the paid API.

Conservative assumptions:

  • 10 calls per second (agent retry rate — some frameworks go higher)
  • No rate limiting at the MCP server level
  • No circuit breaker (no automated detection of the loop)
Paid API Cost per Call 1 Minute 1 Hour
OpenAI $0.030 $18.00 $1,080
Replicate $0.020 $12.00 $720
Anthropic $0.015 $9.00 $540
Google AI $0.010 $6.00 $360
Azure $0.010 $6.00 $360
Cohere $0.010 $6.00 $360
Pinecone $0.008 $4.80 $288
Twilio $0.008 $4.50 $270
AWS $0.005 $3.00 $180
Stripe $0.003 $1.50 $90

These are per-agent numbers. An enterprise with 50 agents, each capable of hitting this loop independently: multiply by 50.

The cost per call estimates are rough averages. OpenAI’s actual cost depends on the model and token count. But the order of magnitude is right: a retry loop against a paid API proxy, at 10 calls per second, runs up a four-figure hourly bill.

The compounding problem

Rate limits protect against more than runaway costs. They’re the circuit breaker for:

  1. Retry storms — agent gets an error, retries, gets the same error, retries harder. Without rate limits, this loop runs until something external stops it.

  2. Prompt injection amplification — an attacker who can inject a tool call into the agent’s context can trigger repeated expensive API calls. The agent dutifully executes them. The MCP server dutifully proxies them.

  3. Credential abuse — if the API key is compromised (or shared across multiple agents), rate limits are the only thing preventing the key from being used to exhaustion.

  4. Noisy neighbor — in multi-tenant deployments, one agent’s retry loop consumes the shared API quota for everyone.

Traditional API gateways solved all of these problems. Rate limiting, circuit breakers, budget caps, per-client quotas. None of these patterns exist in the MCP ecosystem.

Why servers don’t implement rate limits

Three reasons:

1. The MCP spec doesn’t define rate limiting primitives. There’s no standard way for a server to communicate its rate limits to the client. No Retry-After header equivalent. No quota negotiation. The spec handles tool discovery and invocation, not resource management.

2. Most MCP servers are thin wrappers. A typical MCP server is 200 lines of code that translates tool calls to API calls. Adding rate limiting means adding state management, persistence, configuration. That’s more complexity than the server itself.

3. The server doesn’t know the cost. An MCP server that proxies OpenAI doesn’t know how many tokens each call will use, what model the user configured, or what the per-token price is. It can count calls but not cost.

What rate limiting at the control plane looks like

The fix is the same pattern as auth and audit: enforce at the layer that sees all traffic.

A control plane that sits between agents and MCP servers can:

  1. Per-user budget caps — “User X can spend $50/day on OpenAI calls through MCP.” The control plane tracks cumulative cost and rejects calls that exceed the budget.

  2. Per-server rate limits — “No more than 100 calls/minute to the database server.” Enforced regardless of how many agents are calling it.

  3. Circuit breakers — if a server returns errors for 5 consecutive calls, stop calling it. Don’t let the agent retry forever.

  4. Cost attribution — track cost per user, per agent, per server. The invoice shows where the money went.

This is the limitabl pattern. The MCP server doesn’t need to implement rate limiting. The control plane rate-limits, budgets, and circuit-breaks on its behalf.

The server proxies the API. The control plane makes sure the proxy doesn’t become a firehose.


Methodology

Paid API detection via environment variable name matching. Servers are flagged only if they require a specific paid API key (e.g., OPENAI_API_KEY, ANTHROPIC_API_KEY, STRIPE_SECRET_KEY) as a configured environment variable. This is a conservative, high-confidence approach — it undercounts servers that proxy paid APIs without requiring a dedicated env var, but avoids false positives from description-only mentions.

Rate limit detection via README scanning for rate-limit-related terms.

Limitations:

  • Cost-per-call estimates are rough averages; actual costs vary by model, token count, and pricing tier
  • A server could implement rate limiting without documenting it in the README
  • The 10 calls/second retry rate is conservative; some agent frameworks retry faster

The blast radius calculations are illustrative. The point isn’t the exact dollar amount — it’s that the ecosystem has no standard mechanism for preventing runaway costs, and 85% of paid API proxies don’t implement one independently.

Read the reference architecture → · Get started free →

Share: Twitter LinkedIn
Related posts

← back to blog