The data

Name: Agentic Control Plane first-party agent data
Creator: Agentic Control Plane
License: https://creativecommons.org/licenses/by/4.0/

Most claims about AI agents are estimates. The numbers on this page aren’t — each one comes from something we ran, captured, metered, or scanned ourselves. This is the canonical index: the number, what it means, how it was produced, when, and the write-up with the full detail. If you cite one (please do), cite the method with it.

Two ground rules we hold ourselves to: cost and traffic figures are from our own workspaces — dogfood, not customer data — and every number links to the post where the methodology is spelled out, including its limitations.

Tool surfaces (captured from live traffic)

Number	What it is	Method & date	Source
76 tools	Declared by one real Claude Code session (v2.1, Chrome extension + connectors available): 35 core harness, 21 browser control, 20 connectors	Captured from live API request bodies — the `tools` array the harness sends on every call. 2026-07	Tool Surface Index · the argued posture
64 of 76	Tools declared but never invoked in that session — standing capability, not used capability	Same capture, declaration vs invocation log. 2026-07	ACP for coding agents
17 tools	Declared by OpenAI Codex CLI (v0.142) via the Responses API	Same capture method. 2026-07	Tool Surface Index
+21 tools, mid-session	One session’s surface grew by 21 tools partway through the day — deferred tools loaded, no prompt, no changelog	Declaration diffing across requests in captured traffic. 2026-07	Which tools to deny out of the box

Metered cost (priced per call, at API rates)

Number	What it is	Method & date	Source
210,840 tool calls	Governed tool calls metered across 94 of our own workspaces	Every call through the ACP gateway logged with model, tokens, and estimated cost; ~10,200 calls sampled for the distributions	What 210,000 agent tool calls actually cost (2026-04, updated 2026-06)
~89%	Share of total spend that is the orchestration loop (the model re-reading context to pick the next step), not the leaf work	Every call tagged `callKind: loop` or `leaf`; spend split by tag. 2026-06 snapshot	The loop tax
80% of spend, 7.6% of calls	One frontier model’s share of the bill vs its share of call volume — roughly 114× the per-call cost of the cheapest model in the workload	Per-call cost attribution across the same 210,840 calls	The teardown
85% reads	Share of sampled tool calls that are read operations (`read_file`, `grep`, `cd`, …)	Tool-name classification over the ~10,200-call sample	The teardown
10.3 seconds	Average duration of an orchestration step (`chat.completion`) — the loop is the slow part and the expensive part	Metered latency on governed calls. 2026-06	The loop tax
$148.16	One full working day of Claude Code on a Max subscription, priced at API rates: 276 model calls, 1,697 tool calls	Model traffic routed through the ACP cost proxy; each call priced at list rates while the subscription passes through untouched. 2026-06	ACP for coding agents · Claude Code cost tracking
100% loop tax, 72M tokens	That $148 session’s spend was entirely loop — 72M tokens of context re-read at a 100% cache hit rate (the only reason it wasn’t ~10× more); the last 28 turns bought 10% of the output	Turn-by-turn session X-ray on the same proxy data	Claude Code cost tracking
90% on one step	Share of our agent-builder’s model bill spent on a single step — deliberately, because per-call attribution let us route each step to the model it needs	Per-step cost attribution on our own production agent. 2026-06	One step is 90% of our agent’s model bill

Model benchmark (agents, not leaderboards)

Number	What it is	Method & date	Source
14 models, two ways	Flagships from Anthropic, OpenAI, Google + open models (Llama, DeepSeek, Qwen, Mistral, GLM), tested as isolated tool calls and as full agent loops	Deterministic grading where possible, a 3-judge model panel for prose; agent runs scored on completion, 3 runs per scenario with spread reported; cost = live pricing × actual tokens. 2026-06	We benchmarked 14 models on real agent runs
0.83–0.95 vs 0.06–0.77	The whole field ties on isolated tool calls; the same models spread by more than 10× on completing a real agent loop	Same benchmark, both test modes	The benchmark
0.83 → 0.06	DeepSeek V3.2’s isolated score vs its agent-completion score — perfect calls in a vacuum, cannot drive a loop to a finish	Same benchmark	The benchmark

Ecosystem scans (security research)

Number	What it is	Method & date	Source
7,522 skills scanned	Every skill on the ClawHub registry, statically analyzed: 4,931 findings across 746 skills, ~61% estimated false-positive rate after triage	40 regex patterns from published research (Snyk, Cisco, Kaspersky), run airgapped in Docker with `--network=none`; static analysis — a floor, not a ceiling	I audited 7,522 AI agent skills (2026-03)
8,216 MCP servers	Public MCP servers scanned for input-validation posture (7,840 tools) and classified by auth appropriateness	Registry-wide static scans; methodology and caveats in each post	Input validation · auth appropriateness (2026-03)

Using these numbers

Cite freely with attribution — “per Agentic Control Plane’s metered data” plus a link to the source post is ideal, because each post carries the caveats that keep the number honest (sample sizes, dogfood-not-customer scope, static-analysis limits). If a number here disagrees with a post, the post is canonical and this page needs an update — tell us.

The captures and meters that produce this data run continuously. To point them at your own agents:

curl -sf https://agenticcontrolplane.com/install.sh | bash