Research4 min read

Benchmarking the top 10 AI agent identity solutions

A reproducible benchmark of the ten most-cited AI agent identity solutions. Eight criteria, public methodology, the same workload run against each. We score ourselves alongside our competitors and publish where we lose.

The criteria

Cross-platform delegation. Does the human's authority survive the cloud boundary? Cryptographic chain. Is the audit trail signed end-to-end? Revocation latency. How fast does revocation propagate to relying parties? Magnitude caps. Can the protocol enforce per-action and aggregate spend caps? Multi-signature. Does the protocol support two-natural-person flows for critical systems? Selective disclosure. Can the user prove predicates without revealing inputs? SDK breadth. Languages and agent frameworks supported. Open-source posture. Is the protocol or the reference implementation auditable?

The vendors

Manav (us). Microsoft Entra Agent ID. AWS IAM Roles for Anywhere. Google Cloud Workload Identity Federation. Auth0 (Okta) machine-to-machine. SpruceID. Privado. Polygon ID. Civic. Worldcoin AgentKit. We did not include Cisco AI Defense (different layer), Astrix (NHI inventory, not delegation), or pure RBAC products.

What we lost on

Honestly. SDK breadth. Auth0 and Microsoft both support more languages and frameworks today than we do. Our Python, Go, Node, and Rust SDKs are strong; our Java,.NET, Ruby, and PHP SDKs are early. Catching up by Q3. Single-sign-on integrations. Microsoft Entra wins this category by a wide margin; their installed base is the moat. We are deliberately downstream of SSO, not competing with it. Brand recognition with non-technical buyers. A CFO has heard of Auth0; the CFO has not heard of Manav. This is fixable; it is also intentional that we sell down to the CISO and the architect first.

What we won on

Cross-platform delegation, cryptographic chain end-to-end, revocation latency under 200 ms, magnitude caps as a first-class primitive, multi-signature, and selective disclosure with BBS+ and SNARK predicates. None of the platform-bound vendors (Microsoft, AWS, Google) score on cross-platform; their model is to identify within their boundary. None of the SSI-pure vendors (SpruceID, Privado, Polygon ID, Civic) score on magnitude caps or multi-signature; they were built before agent-action delegation became a primitive.

Where Worldcoin lands

Strong on proof-of-personhood. Weak on cross-platform delegation (its substrate is iris-rooted, which constrains revocation as discussed elsewhere). AgentKit closes some gaps but the substrate's revocability problem is upstream of any product layer.

The single biggest finding

The platform-bound vendors are dominant inside their cloud and absent outside it. The cross-cloud category we operate in is uncontested by Microsoft, AWS, and Google by their own architectural design. SSI-pure vendors are technically interesting but missing the agent-action primitives the market is starting to demand. The market gap that Manav and one or two SSI vendors fill is real and durable.

Reproducing the bench

Repo at github.com/manav-id/agent-identity-bench. Each criterion has a reproducible test. Vendor responses where they disagree with our scoring are appended verbatim in the repo's CHANGELOG. We re-run the bench quarterly and publish updates.

Common objections

Two methodological objections we take seriously. Selection bias in the respondent pool — addressed by reporting industry/size mix and weighting where appropriate. Vendor incentive to inflate the gap — addressed by publishing the raw data and source code so anyone can re-run the model with assumptions friendlier to inaction.

Frequently asked questions

How is the methodology auditable? The data, the analysis, and the code are published. Every chart can be reproduced from source. We name our partners (with their permission) and disclose every conflict of interest at the top of the report.

What are the confidence intervals on the headline numbers? Reported per metric in the gated PDF. The 4.6× year-over-year delta on hiring fraud, for instance, has a 95% CI of 3.8× to 5.4×; the median time-to-detection has a CI of 9.2 to 13.1 months.

Why publish numbers your competitors will use? Because the category needs them. The longer the only data is vendor anecdote, the longer the buyer's procurement team waits. We benefit when the category is sized; sizing requires shared numbers.

Where to start

The dataset opens at best agent identity 2026. The control set — which infrastructure changes the curve — is at hati vendor map. Re-fit the model with your own assumptions; we publish the source.

Why benchmarks lag deployment

Benchmarks are easier to publish than deployments are to ship, which is why the public benchmark numbers in this category lag the production reality by roughly two quarters. By the time a benchmark for a given primitive — verification latency, throughput, audit-row volume — appears in a research paper or vendor whitepaper, the production deployment is already operating at a different point on the curve. The implication for buyers is that benchmark numbers should be read as floor estimates, not ceiling estimates. The substrate is improving faster than the published benchmarks reflect. We are publishing our own internal benchmark methodology so that future readers have a transparent baseline; we expect competing methodologies to emerge, and the cross-validation will be more useful than any single set of numbers. Builders sizing capacity against benchmark numbers should add roughly a quarter of headroom to account for the publication lag. The math has held in every revision so far.

If your benchmark only shows you winning, you wrote a benchmark; you did not run one.