Skip to content
Agentic Control Plane

Building the same agent fifteen ways: what each framework taught us about governance

David Crowe · 5 min read
governance frameworks integration-patterns

If you build an agent in CrewAI, in LangGraph, and in the Vercel AI SDK, the agent is recognisably the same — same task, same tools, same outputs. The governance surface is not. Each framework expresses identity, tool execution, and policy in a different shape. Each one does some things well and other things you have to invent yourself.

We just ran the experiment: built the same reference agent across fifteen framework and client integrations. The starter kit is here. Same agent. Same governance contract. Fifteen runtimes.

The headline finding is the boring one — they all work. The interesting findings are about the gaps that show up consistently regardless of which framework you pick. Five of them are useful to anyone building autonomous agents.

1. The most-used flag is the least safe one

Almost every tutorial for running Claude Code autonomously reaches for --dangerously-skip-permissions. Anthropic’s docs explicitly label that flag “recommended only for sandboxes with no internet access.” People run it outside sandboxes anyway, because it’s the flag the tutorials use.

The safer alternative — --permission-mode=auto, with model-based classifiers approving or denying each tool call — exists, ships in the same binary, and is less searchable. The one that says “dangerously” in its name is the one most teams default to.

This isn’t a Claude Code problem; it’s a governance-discoverability problem. The pattern repeats across runtimes. Codex CLI has a --dangerously-bypass-approvals-and-sandbox flag with similar gravity. The safer paths exist; they’re harder to find.

Lesson: if your team is running an agent autonomously, audit which flags are in your scripts. The default-discoverable flag is rarely the right one.

2. Static allowlists break on real shell traffic

dontAsk mode plus a tool allowlist (Bash(curl *), Bash(date *), etc.) sounds like the disciplined approach. In practice, compound shell commands — pipes, heredocs, cat > file <<EOF — don’t cleanly match single-pattern rules. When a denied call doesn’t reach the consecutive-deny abort threshold, the agent silently retries different phrasings forever. The allowlist looks like governance; it behaves like a hang.

The pattern that actually works: bare tool entries (Bash) on the allowlist + a dynamic policy layer (a hook, a wrapped handler) that evaluates the actual command. Static rules at the wrong granularity become anti-features.

Lesson: static allowlists are necessary but rarely sufficient. Anything that has to interpret intent — “is this git push going to main?” — needs runtime evaluation, not a regex.

3. Hook-based governance has framework-specific coverage holes

Claude Code’s PreToolUse hook fires on every tool call. Codex CLI’s analogue exists — but as of writing it fires for shell calls and not reliably for apply_patch or most MCP tools. Cursor’s hook is reliable for Composer’s tool dispatch but doesn’t cover IDE-internal actions. Every hook surface has a coverage map; “hook-based” is not a uniform property.

A team that picks the hook pattern thinking “every tool call goes through governance” can ship with real audit blind spots if they don’t read the fine print of which tool types fire the hook.

Lesson: read the hook coverage map for whichever client you’ve adopted. Most ship with at least one tool category that doesn’t fire the hook. Treat that as part of the threat model, not a footnote.

4. Three integration patterns, three different things to govern

Every framework lands on one of three integration shapes:

  • Decorator (CrewAI, LangGraph, Pydantic AI, AutoGen, Google ADK): governance wraps each tool function. Sees tool name, input, output. Doesn’t see LLM calls.
  • Hook (Claude Code, Codex CLI, Cursor): governance runs as a script before/after every tool dispatch. Sees what the host puts in the payload — typically richer than decorator (session, tier, working directory).
  • Proxy (the few remaining clients that only accept a base_url override): governance sits at the network layer, sees the full request the SDK was about to make. Carries the operational weight of being in the LLM hot path.

The pattern is dictated by the framework, not the user. What governance can see is dictated by the pattern.

Lesson: when you pick a framework, you’re also picking a governance ceiling. Decorator-pattern agents will never give you LLM-call visibility without an additional layer. Hook-pattern agents are at the mercy of their host’s payload. Plan accordingly.

5. The boring failure mode is forgetting to bind identity

Across every framework that uses an SDK-style integration, the same correctness footgun showed up: identity binding is separate from the tool-wrapping decorator, and forgetting to bind it doesn’t error — it silently no-ops governance. Your agent runs; your audit trail is empty.

This is the failure mode that doesn’t surface in unit tests, doesn’t show up in framework docs (because frameworks don’t know about your governance), and only manifests when something downstream catches it — which by then is too late.

Lesson: if you’re building agent governance, the binding-identity step has to be loud. A warning when a governed call fires without identity bound. A health-check the agent runs at boot. Anything that turns “I forgot the wrapper” from an invisible bug into a visible one.

What this means for picking a framework

The fifteen-way exercise didn’t surface a winner. CrewAI does multi-agent delegation cleanly. LangGraph’s middleware API is the cleanest governance seam in current SDKs. Pydantic AI is the lightest. Anthropic’s plain SDK demands the least and gives the most control. Each is right for different teams.

What it surfaced is that the governance question is independent of the framework choice. You can build a well-governed agent in any of them. You can also build an ungoverned one in any of them. The framework decides what’s possible; the team decides what gets shipped.

If you’re starting an agent project and want a governance surface working from day one, pick a starter — they’re calibrated to each framework’s idioms — and skip the seven sharp edges this post just walked you past.

AgenticControlPlane.com

Get the next post
Agentic governance, AgentGovBench updates, the occasional incident post-mortem. One email per post. No marketing fluff.
Share: Twitter LinkedIn
Related posts

← back to blog