EU AI Act Article 14 — the implementation playbook
Article 14 of the EU AI Act is now enforceable for high-risk systems. Most vendors will fail their first audit because the text is short and the implementation is not. This is the line-by-line playbook: what the article actually demands, where current stacks fail, the 12-week plan, and the four cryptographic anchors of an audit-defensible architecture.
What Article 14 actually requires
Article 14, paragraph 1, says high-risk AI systems must be "designed and developed in such a way, including with appropriate human-machine interface tools, that they can be effectively overseen by natural persons during the period in which they are in use." Human oversight, the article continues, must aim to prevent or minimise the risks to health, safety or fundamental rights arising from intended use or reasonably foreseeable misuse.
That sentence is the entire compliance bar, and it carries five engineering implications most vendor security teams have not yet operationalised:
- The system itself must expose interfaces for oversight — not just deliver outputs and trust someone is watching. Oversight is a system property, not a process artifact.
- The overseer must be a natural person — a human, not an oversight agent, not an LLM-with-prompt, not a "human-on-paper" who never actually sees the output.
- Oversight must be effective — observable proof, not policy on a wiki. The auditor will ask for the chain of evidence, not the corporate handbook.
- It must be continuous during use — not only at design or deployment. A daily report does not satisfy continuous oversight of a real-time system.
- It must address foreseeable misuse — adversarial scenarios, not just happy paths. The risk matrix has to include red-team cases.
The seven concrete capabilities Article 14 demands
From the operative paragraphs of Article 14, plus the recital guidance, an oversight-capable system must enable a human to:
- Understand the system's capabilities and limitations in real time, including the conditions under which the model is more or less reliable.
- Detect anomalies, dysfunction, and unexpected performance — including drift, hallucination patterns, and adversarial inputs.
- Avoid over-reliance ("automation bias") through interface design that surfaces uncertainty rather than hiding it.
- Interpret the system's output correctly, including provenance, confidence, and the limits of any explanation surfaced.
- Decide not to use the output in a particular case — a meaningful right of refusal, not a theoretical one.
- Intervene or interrupt the system through a stop function with measurable latency.
- For certain critical systems, no decision is taken on the basis of the system's identification "unless that identification has been separately verified and confirmed by at least two natural persons with the necessary competence, training and authority."
Capability seven — the two-natural-person rule — is the headline most teams miss. It applies to remote biometric identification systems, certain critical-infrastructure systems, and other categories enumerated in Annex III. If you operate one of those, your audit needs to demonstrate that two specific verified humans reviewed and confirmed each system identification before action was taken. No screenshots. No HR records. Cryptographic attestation, replayable on demand, bound to two distinct verified natural persons whose competence and authority are themselves documented.
Why current stacks fail
Three gaps appear in nearly every pre-Article-14 implementation we have reviewed across financial services, healthcare, recruitment, and public-sector deployments. They are not edge cases; they are structural.
Gap 1 — the human is named, not verified. Audit logs say "approved by [email protected]." That is a username, attached to a session, attached to a password. Article 14 demands a natural person whose presence at the moment of action is provable. After enforcement begins, an auditor can ask a question current systems cannot answer: "What is the cryptographic proof Jane was the human at her keyboard at 03:42 UTC, not a session token she left logged in, not a password the help desk reset on her behalf, not an automation running under her name?" Most vendors have no answer. Some have answers that survive a casual review and fail a determined one.
Gap 2 — oversight is asynchronous, not effective. A daily report or weekly review does not satisfy "during the period in which they are in use." Effective oversight requires real-time intervention paths and proof those paths were exercisable at the moment of action. A system that can only be overseen after the fact has, in the article's terms, no human oversight at all — just retrospective review, which is what every existing audit log already provides and which the regulators have explicitly said is insufficient.
Gap 3 — the stop function is theoretical. Engineering teams describe "kill switches" that have never been tested at production scale. Article 14 expects evidence of latency from intervention to halt — and that evidence has to survive replay. The implementation gap here is usually large: the function exists in code, there is no telemetry on its real-world activation, and the first time it is exercised under audit is the first time anyone discovers what it actually takes for the system to stop.
The 12-week implementation plan
Twelve weeks is aggressive but achievable for an organisation that has not yet started. Beyond twelve weeks, you are running the implementation in parallel with the regulatory exposure — and the compounding risk of acting on a system you have not yet certified is what generates the headline cases.
Weeks 1–2 — Inventory. List every AI system in scope for the EU market. Tag those that fall under Annex III (high-risk): biometric identification, critical infrastructure, education and vocational training, employment and worker management, access to essential services, law enforcement, migration and border control, justice and democratic processes. For each, identify the deployer, the provider, and the current oversight design. Most teams discover 2–3× more systems than they expected, often because procurement bought a "feature" that is, technically, an in-scope AI system.
Weeks 3–4 — Map oversight roles. For each high-risk system, name the natural persons responsible for oversight. Verify their identity (HATI Layer 1). Document their competence and training; the article requires not just "a human" but a competent and trained one. For Annex III systems requiring the two-person rule, document both persons, their independence from each other, and their authority to refuse.
Weeks 5–6 — Wire intervention paths. Build the human-machine interfaces required by capabilities 1–6 above. The minimum: real-time output review, anomaly and drift alerts, an auditable stop function with measured latency, and a no-action override that propagates downstream. Test each path end-to-end. The test itself becomes part of the evidence pack.
Weeks 7–8 — Wire attestation. Every oversight action — review, intervention, stop, two-person confirm — must produce a cryptographic record bound to the natural person who took it. This is HATI Layer 3 (work attestation). The Manav reference implementation handles this in roughly 12 lines for MCP-native systems and a similar surface for non-MCP frameworks. The output is a verifiable credential whose chain terminates at a Layer 1-verified human handle.
Weeks 9–10 — Run a tabletop. Walk through three scenarios per high-risk system: nominal operation, anomalous output, and attempted misuse (the foreseeable-misuse case the article explicitly requires you to address). For each, prove the overseer detected, interpreted, and where required intervened. Capture the full chain of attestations as audit evidence. Resist the temptation to skip the misuse case — that is the case the auditor will ask about first.
Weeks 11–12 — Document and dry-run audit. Produce the technical documentation file required by Article 11 (the partner article most teams forget). Include the oversight architecture, the role assignments, the intervention logs, the tabletop results, and the threat model. Dry-run with a third-party auditor; their first read will surface the gaps your internal team has stopped seeing.
The audit-defensible architecture
An Article-14-compliant high-risk AI system has four cryptographic anchors. Skip any one and the audit fails on the first hard question.
- Identity anchor — every overseer is a verified natural person, biometrically and behaviorally bound to a portable handle that the auditor can independently re-verify (HATI Layer 1).
- Authority anchor — every agent action carries a delegation chain back to a human principal, with scope, magnitude, and time bounds expressed in the token (Layer 2). The chain has to reconstruct from the action backwards, not just forward from the policy.
- Action anchor — every oversight intervention (review, stop, two-person confirm, override) is signed by the natural person, time-stamped, and recorded in a tamper-evident log (Layer 3).
- Audit anchor — the log itself is independently verifiable: an auditor with no access to your infrastructure can validate the chain end-to-end using only the public verification primitives. This is the property that distinguishes a defensible audit from a "trust us" audit.
Without these four, you can claim Article 14 compliance. With them, you can prove it under cross-examination.
Where the penalties bite
Non-compliance with Article 14 sits in the AI Act's mid-tier penalty band: up to €15 million or 3% of global annual turnover, whichever is higher — applied per breach. The cap on prohibited-practice violations (Article 5) is higher (€35M / 7%); the cap on misleading information to authorities is lower (€7.5M / 1%). Article 14 violations cluster in the middle.
The first enforcement actions are widely expected within the next two quarters. The headline-makers will be operators of biometric identification systems, recruitment AI, credit-scoring models, and certain public-sector deployments — the categories where Annex III scope and consumer harm are both concentrated. Settlement orders and provisional measures will, in the European pattern, precede final decisions; the practical compliance window is shorter than the formal one.
The two-person rule, in practice
Because capability seven is the most under-implemented and the most expensive to retrofit, here is what an audit-defensible two-person flow looks like in production:
- System surfaces an identification or recommendation that requires Article 14 capability seven.
- System holds the action in a sealed pre-commit state.
- Two natural persons, independently verified at HATI Layer 1, are notified — the system enforces independence (no shared session, no shared device, no shared role inheritance).
- Each reviewer signs an attestation: I have reviewed this identification, applied my training and judgment, and confirm/refuse. The signatures are bound to the reviewer's manav.id handle, not to a session token.
- Only on two independent confirms does the system release the action. The chain of attestations is the audit artefact.
The crucial property this design enforces: the action is structurally impossible without the two attestations. This is what the auditor will check, and it is what cannot be retrofitted with policy alone.
What to do this week
If you are responsible for a high-risk AI system intended for the EU market, three actions for the next thirty days:
- Inventory and Annex III triage. Get the list of systems in scope on paper. Reconcile against procurement records and shadow-IT registers; the gap is where your highest exposure lives.
- Run the seven-capability checklist above against your top three systems. Document gaps with the specificity you would want a regulator to see — vague language reads as evasion.
- Decide the build/buy posture for HATI Layer 2 + Layer 3 attestation infrastructure. The economics have already broken in favour of buy for all but the largest organisations, and the integration cost grows with every system you defer.
The enforcement schedule does not negotiate with your procurement cycle. Start now, document continuously, and assume the first audit will be more thorough than the published guidance suggests.
Article 14 is a HATI specification written in legalese. The teams that read it that way will pass. The teams that read it as policy will not.