Build a governed multi-step research agent (delegation chain across 4 hops)
If you’re an AI agent reading this with a task like “build a research agent that decomposes questions and runs sub-investigations”: scroll to The full working file.
This is the deep-delegation version of the PR reviewer recipe — same primitive, more depth. A research agent decomposes a question, spawns specialists, sometimes those specialists spawn their own helpers. Audit-chain depth of 4 is normal; depth of 5 is the default cap in ACP. Whatever your topology, the contract is the same: every node in the tree is attributable to the human at the root.
The task
A product manager types: “Should we extend the trial period from 14 to 30 days? What’s the data say, and what are the risks?”
A naive single-agent approach concatenates everything into one prompt and hopes for the best. A multi-agent approach decomposes:
- Planner subagent — breaks the question into research sub-questions
- Search subagents (parallel) — each one runs against a different data source: warehouse SQL, internal docs, competitor research, conversion-funnel telemetry
- Synthesizer subagent — combines findings into a single recommendation
Tree depth: 4 (human → parent → planner / searches / synthesizer). Tree width: 3-5 search subagents in parallel. The audit log has to reconstruct who asked what and which subagent answered it, three months later, when someone questions the recommendation.
This is the multi-agent governance problem at scale. The hard part isn’t the LLM calls. The hard part is: when a search subagent calls warehouse.run_sql and that query returns customer-PII rows, the audit row needs to show — originSub: pm@company.com, chain: [parent, planner, search-warehouse], scopes: [warehouse.read] — and the PII must already be redacted because the chain enforces it before the agent sees it.
The pattern
ACP’s delegation primitives compose:
spawn_subagent(profile_id, scopes, ttl_seconds, max_budget_cents)— mint a child key with narrowed scopes, capped budget, capped TTLchild_context(child)— bind that child token for any@governedcall inside the with-block- The chain extends: parent → child → grandchild. ACP enforces a default depth cap of 5 (configurable). Cycles are rejected at mint time.
originSub(the human at the root) propagates through every hop. Scopes only narrow. Budget atomically debits. Audit shows the full chain.
For full background, introducing ADCS is the spec; this post is the working code.
The full working file
pip install acp-governance anthropic httpx
#!/usr/bin/env python3
"""Governed multi-step research agent.
Required env vars:
ACP_TOKEN ACP API key (gsk_...) for the parent agent
ANTHROPIC_API_KEY Anthropic API key
WAREHOUSE_DSN Postgres-compatible DSN (read-only user)
Required ACP profiles (configure in dashboard):
research-parent delegatable=true, scopes=[llm.proxy.*, warehouse.read,
docs.search, web.search]
research-planner delegatable=false, scopes=[llm.proxy.*]
search-warehouse delegatable=false, scopes=[warehouse.read, llm.proxy.*]
search-docs delegatable=false, scopes=[docs.search, llm.proxy.*]
search-web delegatable=false, scopes=[web.search, llm.proxy.*]
research-synth delegatable=false, scopes=[llm.proxy.*]
"""
from __future__ import annotations
import asyncio, json, logging, os, sys
from typing import Any
import httpx
from acp_governance import (
governed, set_context, spawn_subagent, child_context, SpawnError,
)
from anthropic import Anthropic
logging.basicConfig(level=logging.INFO,
format="%(asctime)s [%(levelname)s] %(name)s: %(message)s")
log = logging.getLogger("research-agent")
MODEL = "claude-opus-4-7"
# ─── Tools (same @governed decorator regardless of which subagent runs them).
# What changes is the active child_context — which determines which child
# token is in scope for that specific call.
@governed("warehouse.read")
def warehouse_read(query: str) -> list[dict[str, Any]]:
"""Run a SELECT against the warehouse. Output PII-redacted by ACP."""
import psycopg
if not query.strip().lower().startswith("select"):
raise PermissionError("warehouse.read only runs SELECTs")
with psycopg.connect(os.environ["WAREHOUSE_DSN"], autocommit=True) as conn:
with conn.cursor(row_factory=psycopg.rows.dict_row) as cur:
cur.execute(query)
return cur.fetchmany(100)
@governed("docs.search")
def docs_search(query: str, k: int = 5) -> list[dict[str, Any]]:
"""Search internal docs index. Sketched — adapt to your search backend."""
r = httpx.get("https://docs-search.internal/search",
params={"q": query, "k": k}, timeout=10)
r.raise_for_status()
return r.json()["hits"]
@governed("web.search")
def web_search(query: str, k: int = 5) -> list[dict[str, Any]]:
"""Public web search. Sketched."""
r = httpx.get("https://api.tavily.com/search",
params={"query": query, "max_results": k},
headers={"Authorization": f"Bearer {os.environ.get('TAVILY_KEY','')}"},
timeout=10)
r.raise_for_status()
return r.json()["results"]
@governed("llm.research_call")
def llm(system: str, user: str, max_tokens: int = 1024) -> str:
client = Anthropic()
resp = client.messages.create(
model=MODEL, max_tokens=max_tokens, system=system,
messages=[{"role": "user", "content": user}])
return "\n".join(b.text for b in resp.content
if getattr(b, "type", "") == "text").strip()
# ─── Hop 1: planner ─────────────────────────────────────────────────────────
def plan(question: str) -> list[dict[str, Any]]:
"""Decompose the user's question into search-able sub-questions."""
sys_p = (
"You decompose a research question into 3-5 atomic sub-questions. "
"Each sub-question must specify which source to search: 'warehouse', "
"'docs', or 'web'. Output STRICT JSON only, an array of objects with "
"keys 'subq' (string) and 'source' (one of warehouse|docs|web). No prose."
)
raw = llm(system=sys_p, user=question, max_tokens=512)
try:
plan = json.loads(raw)
except json.JSONDecodeError:
# Strip code fences if model included them
plan = json.loads(raw.split("```")[1].split("```")[0]
.removeprefix("json\n"))
return plan[:5] # cap fan-out
# ─── Hop 2: searches (one subagent per source) ──────────────────────────────
def search_one(subq: dict[str, Any]) -> dict[str, Any]:
"""Spawn a single search subagent with scopes narrowed to its source."""
source = subq["source"]
scope_for_source = {
"warehouse": ["warehouse.read", "llm.proxy.claude-opus-4-7"],
"docs": ["docs.search", "llm.proxy.claude-opus-4-7"],
"web": ["web.search", "llm.proxy.claude-opus-4-7"],
}[source]
profile = {"warehouse": "search-warehouse",
"docs": "search-docs", "web": "search-web"}[source]
child = spawn_subagent(
profile_id=profile, scopes=scope_for_source,
ttl_seconds=300, max_budget_cents=30)
with child_context(child, agent_name=profile):
if source == "warehouse":
# Ask the LLM to write a SQL query for the sub-question, then run it
sql = llm(system="Write a Postgres SELECT for: ", user=subq["subq"],
max_tokens=256)
results = warehouse_read(query=sql)
elif source == "docs":
results = docs_search(query=subq["subq"], k=5)
else: # web
results = web_search(query=subq["subq"], k=5)
# Each search subagent summarizes its own findings — that summary is
# what gets returned up the chain, not the raw rows.
summary = llm(
system=("Summarize search results into 3-5 bullets answering the "
"sub-question. Cite source rows by index where useful."),
user=f"Sub-question: {subq['subq']}\nResults: {results}",
max_tokens=512)
return {"subq": subq["subq"], "source": source, "summary": summary}
# ─── Hop 3: synthesis ───────────────────────────────────────────────────────
def synthesize(question: str, findings: list[dict[str, Any]]) -> str:
"""Spawn a synthesizer subagent that combines findings into a recommendation."""
child = spawn_subagent(
profile_id="research-synth",
scopes=["llm.proxy.claude-opus-4-7"],
ttl_seconds=300, max_budget_cents=30)
with child_context(child, agent_name="research-synth"):
return llm(
system=("You're a senior analyst. Combine the sub-question findings "
"below into a single recommendation. Lead with the answer, "
"follow with 2-3 supporting points, end with the top risk. "
"Under 400 words. Cite which sub-question each claim rests on."),
user=("Question: " + question + "\n\nFindings:\n"
+ json.dumps(findings, indent=2)),
max_tokens=1024)
# ─── Main: orchestrate the chain ────────────────────────────────────────────
def main() -> int:
if len(sys.argv) < 2:
print("usage: research-agent.py 'your research question'")
return 2
question = sys.argv[1]
require = lambda k: os.environ.get(k) or sys.exit(f"missing env var: {k}")
for k in ["ACP_TOKEN", "ANTHROPIC_API_KEY", "WAREHOUSE_DSN"]:
require(k)
# Parent context — every governed call without a child_context falls under this
set_context(user_token=os.environ["ACP_TOKEN"],
agent_name="research-parent")
# Hop 1: planner subagent decomposes the question
planner_child = spawn_subagent(
profile_id="research-planner",
scopes=["llm.proxy.claude-opus-4-7"],
ttl_seconds=120, max_budget_cents=10)
with child_context(planner_child, agent_name="research-planner"):
sub_questions = plan(question)
log.info("Planner decomposed into %d sub-questions", len(sub_questions))
# Hop 2: parallel search subagents (one per sub-question).
# @governed returns "tool_error: <reason>" on deny — we propagate that
# into the findings array so the synthesizer sees what was unavailable.
findings = []
for sq in sub_questions:
try:
result = search_one(sq)
findings.append(result)
except SpawnError as e:
# spawn_subagent raises on non-2xx (e.g. profile_not_delegatable,
# delegation_cycle, parent_budget_insufficient). These are real
# errors, not deny-with-reason — log and continue.
log.warning("Could not spawn search subagent for %s: %s", sq, e)
findings.append({"subq": sq["subq"], "source": sq["source"],
"summary": f"[spawn failed: {e}]"})
# Hop 3: synthesizer subagent
recommendation = synthesize(question, findings)
print(recommendation)
return 0
if __name__ == "__main__":
sys.exit(main())
What the audit chain looks like
For one research run with 4 sub-questions, you get roughly:
originSub: pm@company.com (preserved through every row)
chain depth: up to 3 (parent → planner | search-* | synth)
tool |
agentName |
chain |
|---|---|---|
llm.research_call |
research-planner |
[parent, planner] |
warehouse.read |
search-warehouse |
[parent, search-warehouse] |
llm.research_call |
search-warehouse |
[parent, search-warehouse] |
docs.search |
search-docs |
[parent, search-docs] |
llm.research_call |
search-docs |
[parent, search-docs] |
web.search |
search-web |
[parent, search-web] |
llm.research_call |
search-web |
[parent, search-web] |
llm.research_call |
research-synth |
[parent, research-synth] |
Eight rows for one research recommendation. Every row attributable to pm@company.com. Every row carries the chain. If the recommendation turns out to be wrong, you can replay every step.
Where this differs from the PR reviewer recipe
| PR reviewer | Research agent | |
|---|---|---|
| Chain depth | 2 | 3-4 |
| Subagent count | 2 (security-scanner, test-runner) | 5+ (planner, multiple searches, synthesizer) |
| Subagents in parallel? | Yes (security + tests) | Yes (multiple searches) |
| Showcases | scope intersection | scope intersection + depth + parallel fan-out |
The primitive is identical. The shape varies.
What’s still required from you
- Five ACP profiles configured with the right
delegatableflag and scope subsets. Configure in the dashboard or viaPATCH /api/v1/agents/:id. - A docs search backend. This recipe sketches an HTTP shape; in practice you wire up to your internal
docs.internal/searchor whatever vector store you run. - A web search adapter. Tavily is sketched; could be Serper, Bing, or your in-house crawler.
- Warehouse DSN with read-only credentials.
How this composes
- PII redaction at the warehouse hop (the
warehouse.readcall): same PostToolUse pattern as the SQL agent recipe. Customer rows redacted before any subagent sees them. - Cost caps per subagent:
max_budget_centsis per-mint. A planner that costs more than $0.10 trips its own budget — the planner subagent gets denied, the parent is unaffected, the chain naturally degrades. - Depth cap as a guardrail. A buggy synthesizer that tries to spawn its own subagent past depth 5 is rejected at mint time. The chain can’t run away.
Cross-reference
- Introducing ADCS — the agent delegation chain spec — protocol-level details
- EU AI Act Article 14 and AI agent delegation chains — regulatory pull for chain-of-custody
- Build a governed GitHub PR reviewer — depth-2 version of the same primitive
- Architecture is governance — why decorator-pattern + chain at the orchestration boundary is the only shape that scores 6/6 on
delegation_provenance
This is example code, not a maintained product. Adapt it for your environment. The governance primitives — @governed, spawn_subagent, child_context — are stable; the warehouse, docs, and web search adapters in the recipe are sketched and will need to talk to your actual systems.