Claude Code's Deny List Can Be Bypassed. Here's What a Real Enforcement Boundary Looks Like.

David Crowe · 4 min read

claude-code security tool-policy policy hooks

The short version: Claude Code’s deny rules are string matches evaluated inside the client, and both halves of that sentence are bypassable. String matches miss commands rewritten to not match. The client can be told to skip enforcement entirely. Neither is a secret — Anthropic’s own docs caution against treating Bash deny rules as a security boundary, and the write-ups documenting the bypasses rank on the first page when you search for them.

What’s mostly missing from those write-ups is the second half: what enforcement that doesn’t have this property looks like. That’s this post. And since we ship a product whose install registers a Claude Code hook, we’re in the blast radius of our own argument — so we’ll be precise about which of our own layers share the weakness.

The bypass classes, plainly

You don’t need an adversarial model for any of these. A helpful agent trying to complete its task will wander into them.

1. The deny rule matches the string, not the effect. A rule like Bash(rm:*) matches commands that start with rm. The same deletion reaches the filesystem as a compound command (cd /tmp && rm -rf data), via command substitution (bash -c "$(echo cm0gLXJmIGRhdGE= | base64 -d)"), through an interpreter (python3 -c "import shutil; shutil.rmtree('data')"), or by the agent writing a three-line script and running it — which is not a trick, it’s what coding agents do all day. The effects of a shell command are not recoverable from its prefix. Any prefix-matched deny list on a Turing-complete tool is a speed bump.

2. The permission prompt gates the call, not the capability. Approving Bash(make build) approves whatever the Makefile does — including reading .env and posting it somewhere, if the Makefile (or a dependency’s install script) says so. The approval UI shows you the string; the string doesn’t contain the behavior.

3. The client can opt out of its own enforcement. --dangerously-skip-permissions suppresses the permission system and silently disables every PreToolUse/PostToolUse hook. We documented this against our own product: ACP’s hook doesn’t fire, the audit log is empty for the session, and there is no server-side signal that a hook didn’t run. Every hook-based governance tool — ours, anyone’s — shares this property, because the hook only exists inside the process that agreed to run it.

Add these up and the structural statement is: a check that runs inside the agent’s process, on the agent’s terms, enforces nothing against that process. It filters accidents. Accidents are most incidents, so client-side rules are worth having — but they’re a convenience layer, not a boundary.

What makes a boundary real

The criteria aren’t new; they’re the reference-monitor requirements, settled in 1972 and worth restating for agents: the enforcement point must be non-bypassable (on every path the effect can take), tamper-proof (not modifiable by the thing it constrains), and small enough to verify. Claude Code’s deny list fails the first two by construction — not because it’s badly built, but because it lives in the client.

So the fix isn’t a better regex. It’s moving the decision to a place the agent’s process can’t reach:

Gateway-executed tools. When a tool call runs through a gateway — MCP tools fronted by a control plane, typed tools the gateway executes — the tool’s effect is the gateway’s action. A denied tool isn’t blocked; it’s absent from tools/list. There is no client-side flag that makes a tool the server doesn’t expose come back.

The model proxy. Route the agent’s model traffic through a proxy and denied tool_calls are stripped from the response before the harness ever sees them. The harness can’t execute a call it never received. This also can’t be disabled by a client flag — the client’s escape hatch turns off its hooks, not its need to reach a model. (This is the layer claude-acp adds; plain claude stays untouched as your own escape hatch.)

The sandbox, for the shell. Here’s the honest limit on our side of the argument: for a raw Bash tool, even a gateway sees a string. If the threat you care about is a determined exfiltrator with shell access to live secrets, the control that holds is an OS-level boundary — container, VM, file permissions — because it constrains effects, not command text. We’ve said this before and it doesn’t stop being true because we sell the other layer.

The composition that actually works: sandbox the shell, gateway the typed tools, proxy the model — and keep the client-side rules as the fast local filter they are. Enforcement layered from outside in, with the deny list demoted from “security boundary” to “first line.”

What this looks like in practice

One command sets up the outside layers for a coding agent:

curl -sf https://agenticcontrolplane.com/install.sh | bash

That registers the hook (the convenience layer — and yes, the flag disables it; that’s why it’s not the whole story), sets up the model proxy (the layer the flag doesn’t disable), and captures the agent’s declared tool surface — all 76 tools of a live Claude Code session, indexed in the Tool Surface Index — so your allow/flag/deny decisions are made against what the agent can actually do, priced and visible per action.

If you take one thing from this post, though, take the criterion, not the product: for every control you rely on, ask where does it run? If the answer is “inside the process it’s supposed to constrain,” you have a filter. Useful, worth keeping — and not the thing between your agent and the incident. The thing between your agent and the incident has to live where the agent can’t edit it.

Share: Twitter LinkedIn

← back to blog