Story4 min read

I asked 10 AI agents to verify I was human

Ten verification agents. Three test subjects: me, a deepfake-driven impersonator of me, and a quiet, real-but-evasive subject who refused to make eye contact. Same script, same hardware, scoring done by independent annotators. The leaderboard is uncomfortable.

The setup

Each agent is a vision-and-conversation system trained or fine-tuned to determine whether the person in front of the camera is a unique human. We picked the ten with the strongest marketing claims. Each got 8 minutes per subject; same lighting, same camera, same microphone. Each subject answered the agent's questions live.

The leaderboard, in plain terms

Three of the ten correctly identified all three subjects (me real, the deepfake fake, the evasive subject real). Four mixed up the evasive subject as fake — they relied on liveness signals like consistent eye contact that real but neurodivergent humans naturally vary. One misidentified the deepfake as real because the deepfake was good and the agent's training distribution did not include the open-source model the deepfake was generated by. Two refused to render a verdict on any subject, citing low confidence; we treated this as a tie for last place.

What the winners shared

The three top performers used multi-signal verification: not just facial liveness, but also responsive interaction tests, hardware-attestation prompts ("show us your phone's secure-element certificate"), and behavioral patterns over the eight-minute window. The single-signal agents (face only, voice only, text only) were trivially defeated by the deepfake.

What the losers shared

Single-signal designs. Confidence-thresholds tuned for marketing benchmarks rather than real-world distributions. No hardware attestation in the loop. No selective-disclosure or chain-of-trust verification — the agents knew nothing about me before I joined the call, so they had no reference signal to compare.

Why hardware attestation kept appearing

The single most decisive signal in the test was the agent's ability to ask my device to produce a hardware-attested signature. The subjects who could produce one were trivially verified. The deepfake's controller could not (the device was not enrolled with the secure element). The evasive subject could (he just did not enjoy the conversation). The hardware attestation collapsed three minutes of vision-based reasoning into one cryptographic check.

Where Manav fits, and where it does not

Manav does not compete with these vision-based verification agents at the moment of first verification. Manav is the layer that comes after — the layer that says "this human verified at this device on this day, and these are the actions they have taken since." A vision agent that integrates Manav's hardware-attestation handshake gets the decisive signal. A vision agent that does not, ranks somewhere in the bottom of this leaderboard.

The uncomfortable conclusion

Identity verification by AI agent alone is unreliable. The category needs hardware roots, behavioral baselines, and a chain of past attestations to perform well. Vision is one signal among several. Anyone selling vision-only verification at scale is selling last decade's answer.

Common objections

Two questions readers raise. Couldn't this be prevented with better prompts? No — the failures were authority gaps, not prompt failures. Doesn't this just slow agents down? Only at the highest-stakes actions, by design. Velocity for safe work, friction for unsafe work, written into the delegation.

Frequently asked questions

Could the failure described have been prevented? At the delegation layer, yes. A scoped, magnitude-capped, witness-bound delegation would have refused the action at the relying party before the human even saw the request. The model behaved as instructed; the authority was the gap.

How common is this pattern in practice? More common than the press has caught. The cases that surface are the ones that produced headlines or lawsuits; the ones that did not surface are quietly absorbed as 'cost of running agents in production.' We expect the visible ratio to grow as audit trails make the invisible cases discoverable.

What's the immediate lesson? Authority is the bottleneck. Capability is the easy part — the model is good. Ship the delegation layer before the next agent goes into a system that touches dollars, data, or decisions.

Where to start

For the analytic frame behind the story, see how to prove human 2026. For the practical playbook the principals would have wanted in advance, see deepfake defense matrix.

Why this case was the inflection

The reversal — agents verifying humans rather than humans verifying agents — was the case that flipped the frame for most of our enterprise customers. As long as humans were the verifying party, identity infrastructure was the buyer's discretionary spend. The moment agents became the verifying party, identity infrastructure became the customer-facing latency budget. The procurement velocity changed inside one quarter. CIOs who had deferred the question for two years signed in two weeks because the new framing made the cost of delay legible to the line of business in a way the prior framing never had. The inflection is not about the technology — the verification primitives are the same — it is about who pays the cost of failure. When humans pay, the cost is reputation. When agents pay, the cost is operational velocity. Velocity has a budget owner; reputation often does not. The lesson is that the procurement story moves faster when the cost of inaction sits with someone who controls a budget.

If your verification agent has not asked your device for a signature, your verification agent is guessing.