EU AI Act Article 14 and AI Agents: Mapping Human Oversight to Delegation Chains

David Crowe · April 16, 2026 · 10 min read

eu-ai-act article-14 compliance delegation-chain adcs human-oversight governance

EU AI Act Article 14 now takes effect on December 2, 2027 — the Digital Omnibus (June 2026) moved the high-risk regime back from the original August 2026 date. If you deploy a high-risk AI system in the EU — or a system that serves EU users — you will be required to demonstrate effective human oversight. Not claim it. Demonstrate it, with artifacts, to regulators and auditors who don’t accept “a human can always interrupt” as a design statement.

For multi-agent systems — anything where one agent delegates to another — this is the pinch point. Most AI agent deployments today have no deterministic record of which human authorized what, which narrower scope each downstream agent ran under, or where the oversight boundary actually sits. Prompt-based approval (“the agent asks before doing anything destructive”) is not an oversight control under Article 14. The model is the thing being overseen; it cannot also be the overseer.

This post maps Article 14’s concrete requirements — paragraph by paragraph — to the artifacts an agent governance system needs to produce. We use ADCS (the Agent Delegation Chain Specification) and three-axis governance as the reference. Both are open, both are in production, both emit the exact shape of evidence Article 14 will demand.

What Article 14 actually says

Article 14 of the EU AI Act applies to high-risk AI systems as defined in Annex III — which includes systems used in employment, credit decisioning, critical infrastructure, law enforcement, education, essential private and public services, and more. If your AI agent touches any of those use cases on behalf of an EU user, Article 14 applies.

The paragraphs that matter for agent systems:

Art. 14(1): the system shall be designed such that it can be effectively overseen by natural persons during use.
Art. 14(2): oversight shall aim to prevent or minimise risks to health, safety, and fundamental rights.
Art. 14(4): the system shall be provided such that natural persons assigned to oversight are enabled — as appropriate and proportionate — to:
- (a) properly understand the relevant capacities and limitations of the system and duly monitor its operation;
- (b) remain aware of automation bias;
- (c) correctly interpret the system’s output;
- (d) decide, in any particular situation, not to use the system, or to disregard, override, or reverse the output;
- (e) intervene in the operation of the system or interrupt it through a ‘stop’ button or similar procedure.

Every sub-paragraph in 14(4) is the test an auditor will apply. “Effectively” and “as appropriate and proportionate” are doing the enforceable work. You need evidence that oversight is real, not aspirational.

Why prompt-based oversight fails Article 14

A common pattern in agent frameworks today is the model-mediated approval: the agent says “I’m about to delete this repo — are you sure?” and a human clicks yes or no. Two problems, both disqualifying under Article 14:

The overseer is the overseen. The model chose what to ask about. If the model decides a destructive action isn’t worth confirming, no human ever sees it. Oversight that depends on the model’s judgment about what to flag is not human oversight — it’s model self-moderation.
There is no deterministic record. A prompt-and-click flow produces a conversational log, not an audit artifact. An auditor asking “show me every action that required oversight in Q3” needs a queryable, version-controlled record of policy decisions. Prompts in a transcript are neither.

Article 14(1) uses the word effectively. Effective means testable, not subjective. You need a control that is (a) external to the model, (b) deterministic, (c) idempotent, and (d) produces an audit record that survives the session.

That is the specific shape of ADCS plus three-axis governance.

The mapping — Article 14(4) to agent governance artifacts

Below, every sub-paragraph is mapped to the artifact a deployer using ADCS + three-axis governance can point to when an auditor asks.

(a) Understand the capacities and limitations; monitor operation

What Article 14(4)(a) requires: the overseer must be able to see, in real time and in retrospect, what the system can do and what it is doing.

What ADCS + three-axis produces:

The agent profile registry — every agentProfileId is declared up front with its allowed scopes, tools, and tier. An overseer sees the full catalog of what agents exist in the system and what each is authorized to do.
The activity log — every governed tool call is recorded with the full delegation chain at the moment of invocation. (originSub, agentProfileId, agentRunId, tool, decision, reason, latency) is the per-call record.
Three-axis effective policy — the overseer can inspect the current policy on any axis at any time. “What can agent X do when invoked by user Y on tool Z?” is a single query.

Artifact for the auditor: export of the agent profile registry, the workspace policy document, and the activity log for the audit window.

(b) Automation bias awareness

This is the sub-paragraph ADCS does not directly satisfy, and it’s worth being explicit.

Article 14(4)(b) is about the overseer’s own cognition — preventing them from over-trusting model output. This is primarily a UX, training, and process concern. It’s not a data-structure problem.

What the governance layer can contribute: confidence indicators, explicit flags when a model output has been accepted without intermediate review, and dashboards that show oversight engagement rates per reviewer. ACP’s activity log surfaces “auto-approved in under N ms” decisions distinctly from human-reviewed ones, which lets you detect when an overseer has defaulted to rubber-stamping. But the control itself is organizational.

Artifact for the auditor: training records, UX design artifacts, and — from ACP — oversight engagement metrics per reviewer.

(c) Correctly interpret the system’s output

What Article 14(4)(c) requires: the output must be presented in a way an overseer can interpret without additional inference work.

What ADCS + three-axis produces:

Deterministic decision reasons. Every deny or rate_limited decision includes a machine-readable reason pointing to the specific rule that fired on the specific axis. No post-hoc guessing about why the system blocked something.
The full chain in every audit entry. When an overseer reviews a flagged action, they see the originating human, every intermediate agent, the effective scopes at each hop, and the budget state. No “black box between the prompt and the side effect.”

Artifact for the auditor: sample audit entries showing decision reason codes, the chain structure, and the policy rules referenced.

(d) Decide not to use / override / reverse

Article 14(4)(d) is the deny path. An overseer must be able to stop a specific action, a specific agent, or a specific user — after the fact or preemptively.

What ADCS + three-axis produces:

Deny rules on any axis. A deny on the tool axis blocks a tool globally. A deny on the agent axis stops a specific agent profile. A deny on the user axis revokes a user’s access. All changes are version-controlled and take effect on the next call.
Audit mode vs enforce mode. Before switching a rule to enforce, you run it in audit mode: the decision is computed, logged as “would have denied,” but the call proceeds. You see exactly what you’re about to block before committing.
Reversal of state. ADCS chains carry remaining budget at every hop. Clipping a parent’s budget to zero deterministically halts every descendant agent whose remaining budget now fails the min() check.

Artifact for the auditor: the policy change log (who changed what rule when, and why), plus audit-mode evidence that overseers test changes before enforcing.

(e) Intervene / stop-button

Article 14(4)(e) is the fastest-path version of (d). There must be a procedure to halt the system now.

What ADCS + three-axis produces:

Workspace enforce mode toggle. One flag switches the tenant between audit mode and enforce mode. Audit logs continue; enforcement kicks in on the next call.
Agent-type kill switch. Setting { permission: "deny" } on an agentType policy halts every invocation of that profile — across every user and every tool — with one write.
User kill switch. Setting { permission: "deny" } on a userPolicy halts every invocation by that user, regardless of agent.
Delegation freeze. ADCS’s scope-intersection rule means clipping a parent agent’s scopes to the empty set halts every descendant in the chain. One write, full subtree stop.

Artifact for the auditor: documented kill-switch procedure, tested with audit-log evidence of successful halts during drills.

What this looks like end-to-end

The system an Article 14 auditor wants to see is one where every action by an AI agent produces a machine-readable record of (origin, authority, scope, decision, reason), where overseers can query and modify that record deterministically, and where the control surface is demonstrably external to the model itself.

In practice, for a deployer using ACP + ADCS, that’s five artifacts:

Agent profile registry — catalog of what agents exist and what they’re allowed to do.
Workspace policy document — the effective rules on each of the three axes.
Activity log with full chains — every tool call, per ADCS §9, with origin, chain, decision, reason.
Policy change log — every admin write, who did it, when, why.
Oversight engagement metrics — reviewer response times, auto-approval rates, drift alerts.

Those five exports, plus the process documentation around training and drill testing, answer Article 14(4) concretely. An auditor doesn’t have to take the deployer’s word for anything — the system’s behavior is self-describing.

What this does not do

ADCS and three-axis governance are the data half of Article 14. They are not:

A fundamental rights impact assessment. Article 27 / FRIA obligations are a separate analysis. ADCS provides some of the input evidence.
A conformity assessment. Providers of high-risk systems still need to run the Article 43 conformity procedure. ADCS logs are inputs; they don’t replace the procedure.
A risk management system in the sense of Article 9. Risk management is a continuous process; ADCS emits one of the feed streams it consumes.
A substitute for organizational oversight training. The overseer is a natural person under Article 14 — they still need to be trained, and the deployer still has to document that training.

The question to ask is: which of Article 14’s oversight requirements are deterministic data problems, and which are organizational or cognitive problems? ADCS solves the first set cleanly. The second set is still the deployer’s job.

What to do before December 2027

If you deploy AI agents in the EU and Article 14 will apply:

Inventory your agents. What agent profiles exist? Who invokes them? What tools do they call? If you can’t answer this today, you will fail the (a) test.
Emit structured audit entries. Every governed tool call should produce a record with origin identity, chain, decision, reason. If your logs are unstructured conversational transcripts, Article 14(4)(c) is going to be painful.
Run in audit mode first. For every high-risk tool, set up a rule, run it in audit mode for a week, review the “would have denied” cases, then enforce. This is how you pass the (d) proportionality test.
Document the stop-button procedure. Write down exactly how to halt the system. Test it during a drill. Attach the drill evidence to the compliance file.
Export templates. Get the five artifacts above into a form you can hand to a legal or audit team without engineering time on the critical path.

ADCS and three-axis governance are published and in production. If you want the mapping as a one-page compliance artifact, every section of this post is derived from what a properly configured ACP deployment emits by default.

The deadline has moved — the Digital Omnibus pushed the high-risk regime (including Article 14) to December 2, 2027 — but effectively overseen is still the phrase that will get tested, and two years is exactly enough time to build evidence habits rather than scramble for them. The difference between a vendor that survives the first Article 14 audit cycle and one that doesn’t will be whether their oversight evidence is machine-generated and queryable, or a screenshot of a chat transcript.

Further reading

ADCS spec — Agent Delegation Chain Specification
Three-axis governance — tool / agent / user ABAC
Agent-to-agent governance overview
SOC 2 audit trails with ACP
Governed CrewAI in 3 minutes — a worked example of the chain produced by a multi-agent crew

ACP Cloud free tier — 10,000 calls/month, no card, audit logs and policy editor included. Sign up to produce your first Article 14 audit export.

Share: Twitter LinkedIn

← back to blog