Cost Per Customer in CrewAI Agents
CrewAI will tell you what a run cost. What it won’t tell you — and what you actually need once you have paying customers — is what each customer cost you. The usage callbacks stop at the LLM-call level: total tokens, total dollars, maybe per-agent. The moment you’re running the same crew for 30 tenants, “which customer is unprofitable” becomes a question your tooling can’t answer, and you’re left exporting token logs and guessing.
This is the gap every multi-tenant CrewAI deployment hits. Here’s how to close it, and why the answer is more useful than a tenant tag on a token counter.
Why per-run cost isn’t per-customer cost
A CrewAI crew running for a customer isn’t one model call — it’s a loop. Agents hand off, re-read context, retry, and call tools, and the bill is the sum of all of it. To attribute that to a customer you need two things CrewAI doesn’t give you together:
- The customer identity threaded through the whole run, so every model call in the loop — not just the first — is tagged with who it was for.
- A cost model that understands the loop, so you can see why a customer is expensive, not just that they are.
Bolt a tenant ID onto a token callback and you get the first, crudely. You don’t get the second — and the second is where the money actually hides.
The loop tax is where customer cost concentrates
In real agent runs, the overwhelming majority of the bill isn’t the useful final answer — it’s the orchestration loop re-reading its growing context every turn to decide the next step. We metered 210,000 of our own tool calls and one frontier model on the loop was 80% of the spend across 7.6% of the calls. A customer whose crew takes 14 turns instead of 4 isn’t 3.5× more expensive — they’re far more, because each turn re-pays for the whole accumulated transcript. That’s the loop tax.
So “cost per customer” done right isn’t a single number per tenant. It’s: this customer, this much spend, this much of it was loop vs. the actual work, on these models — which immediately tells you the lever (route the loop to a cheaper model, cut the turn count) instead of just flagging a number.
How to get it for a CrewAI deployment
Route your crew’s model calls through a layer that meters and prices each one, and thread the customer identity so every call in the loop is attributed:
- Bind the customer/user identity per run — pass it on each request so every model call the crew makes carries it, not just the first.
- Meter and price each call — capture the real token cost (not an estimate), tagged with the agent and whether the call was orchestration (
loop) or a sub-task (leaf). - Roll up by customer — now you have cost per customer, the loop-vs-leaf split inside it, and which agent and model drove it.
The output is a per-customer cost X-ray: spend, the loop tax, the model mix, and the agent responsible — the data you need to price tiers, spot unprofitable tenants, and cut the spend that’s pure orchestration overhead.
A note on accuracy
Two things make this trustworthy where a tag-on-a-counter isn’t. Cost is computed from real token usage, including cache reads and writes — not a static price table, so cached loops aren’t overcounted. And the loop-vs-leaf split is tagged on every call (callKind), so the attribution reflects what the run actually did, not an after-the-fact guess.
Where to start
- Thread the customer identity through every model call, so the whole loop is attributed — not just the entry call.
- Price each call from real usage and tag loop vs. leaf, so you can see why a customer costs what they do.
- Roll up per customer and read the loop tax — then route the loop to a cheaper model or cut turns to fix the tenants that hurt.
CrewAI tells you the run cost; this tells you the customer cost, and where it’s hiding. See the CrewAI integration to wire it in, or the loop tax for why orchestration is the bill.