Four firsts: making AI agent actions verifiable on-chain, in 369 Wallet

Every consumer wallet is bolting an LLM into a chat UI right now. None of them answer the harder question — if the AI loses you money, how do you prove what it actually did? 369 Wallet shipped a full-stack ERC-8004 implementation that answers it cryptographically. Four world firsts, all live on Arc Testnet today, all open source.

369wallet teamMay 27, 20266 min read

369 Wallet — four firsts in verifiable AI agent accountability, on-chain, via ERC-8004

Every consumer wallet on the market is wiring an LLM into a chat box right now. Phantom, MetaMask Snaps, Coinbase Wallet, Rabby — there is no shortage of "ask the AI to swap" interfaces. The race is loud, and on the surface it is mostly about who can wrap the friendliest dialogue around the same set of underlying primitives.

The race has, so far, skipped the actually hard question:

If the AI moves money on the user's behalf and the user later disagrees with what happened — to a counterparty, to a court, to the user's own audit trail — how do you prove what the AI actually did?

Today's answer is, more or less: trust the company's server logs. OpenAI keeps the response. Anthropic keeps the thumbs-up. The chain of evidence lives inside private databases at AI vendors and wallet vendors. Mutable. Deletable. Unverifiable by the user, and unusable as evidence by anyone external.

369 Wallet shipped a different answer. We built a full-stack implementation of ERC-8004 — the trust standard for AI agents — and wired it directly into the production mobile app on Arc Testnet. Along the way we hit four firsts in consumer-wallet territory. This piece walks through each one, with on-chain links so the claims are auditable rather than rhetorical.

1 — On-chain audit of every AI action in the wallet

When a user opens the AI chat in 369 Wallet and says "send 1 USDC to alice," three things happen in sequence:

The agent produces an EIP-712 typed signature recommending the transaction. This is not a chat string. It is a signed claim with structured fields.
The user authorizes the actual on-chain transaction from their device with a passcode and the local key.
Both events — the agent's recommendation and the user's authorization — get written to a contract called ValidationRegistry on Arc Testnet.

The result is a cryptographic chain of evidence: who, when, prompted by what agent recommendation, authorized what transaction. Anyone can read it. Nobody can quietly edit it. The wallet vendor's server is not part of the trust assumption.

To our knowledge, none of the other production consumer wallets — MetaMask, Trust, Phantom, Coinbase Wallet, Rabby, Rainbow, Brave, OKX, Bitget, TokenPocket — ship this structure. AI action records live in their company logs, and users have no way to verify whether the agent actually recommended what the UI says it recommended after the fact.

Verify on Arc Testnet: ValidationRegistry

2 — Reputation that only the person who experienced the action can write

AI reputation systems today come in two flavors:

Centralized. Thumbs-up / thumbs-down stored on a vendor server. Mutable, unverifiable, and trivially gameable by the vendor.
Oracle- or stake-based. Third parties issue attestations (EAS, Karma3, etc.). Sybil resistance comes from making attacks expensive, not impossible.

369 Wallet's ReputationRegistry took a third path: only the address that actually experienced the validated action can rate it. The contract enforces this with a single line — msg.sender == ValidationRegistry.submitter. As a consequence:

A bot can't pre-rate. To rate, it has to first generate the matching validated action under that same address.
Each validation key can be scored exactly once (AlreadyScored revert).
The counter is append-only — no edits, no deletions.

No oracles. No staking. No external identity provider. Sybil is structurally impossible rather than economically discouraged. It is, as far as we have been able to find, the first consumer-wallet AI reputation system built this way.

Verify on Arc Testnet: ReputationRegistry

3 — A three-tier ERC-8004 stack, in a shipping mobile app

ERC-8004 specifies trust infrastructure for AI agents in three layers:

IdentityRegistry — who an agent is.
ValidationRegistry — what the agent did.
ReputationRegistry — whether the agent should be trusted next time.

Most other ERC-8004 implementations to date are some combination of (a) contracts only, no client, (b) a demo web interface, or (c) not in production. 369 Wallet is the first we are aware of that hits all three of these conditions simultaneously:

All three registry contracts deployed and verified on Arc Testnet.
Apache-2.0 licensed, end-to-end open source.
Foundry test suite green at 39/39, including 256-run fuzz tests.
Shipping in production on Google Play and in TestFlight review for iOS.
The production app talks to the registries directly. Not a demo path.

Any team wanting to integrate ERC-8004 into their own wallet now has a reference implementation: contracts, mobile client, signature pipeline, UX patterns, all together.

Open source: github.com/369wallet/369-agnet-8004

4 — Natural-language token-approval revoke across 7 EVM chains, in one passcode

The single most common way a user loses funds in Web3 is not getting their private key stolen — it is granting a malicious or compromised contract unlimited approval over a token. The dApp goes bad, the approval is still live, the contract drains the balance. The pattern is so consistent that hundreds of these incidents are catalogued per quarter.

The available defenses are limited. Revoke.cash is desktop-web only, manual, one approval at a time. MetaMask's in-app approvals view is one chain at a time. Trust Wallet sits at the same level.

369 Wallet's AI agent ships the simpler interface: "revoke my risky approvals." Behind that sentence:

Seven mainnets — Ethereum, BSC, Polygon, Arbitrum, Base, Optimism, Avalanche — get scanned in parallel.
Unlimited and unknown-spender approvals are identified and risk-ranked.
The user confirms once with a passcode; the revoke transactions go out in a batch.
Every revoke gets anchored to the same ValidationRegistry as everything else the agent does.

Mobile + natural language + multi-chain + AI-driven, all at once. To our knowledge, the first production wallet to ship that intersection.

Why this matters past the wallet

Most of the AI-in-wallet conversation today is about capability — what can the agent do, how natural does the chat feel. The harder problem coming directly behind it is accountability. When agents act with real money, the question of "what did it actually do, and how do you prove it" stops being abstract.

It will not stop at wallets. Brokerages will delegate to agents. Banks will. Insurance carriers will. Eventually some government services will. The same question will surface, harder each time: can the actions of this AI be independently verified after the fact?

369 Wallet is not claiming to have solved that for every domain. What we are claiming is that the four pieces above — on-chain audit of every recommendation, sybil-immune reputation, a full ERC-8004 stack in a shipping mobile app, multi-chain natural-language revoke — together make up the first reference implementation of "AI accountability with the user, not the vendor" in the most demanding environment we could find: a consumer wallet on a phone, used by non-experts, across many chains.

It's open source. The shape ports. Other teams adapting it to their own domain — exchange, bank app, government system — no longer have to answer "how should this work?" They just have to answer "when?"

Verify it yourself

Open-source contracts:

github.com/369wallet/369-agnet-8004 — Apache-2.0, full repo

Arc Testnet deployments:

AgentIdentityRegistry — 0xAceB520444ddeDec663277FC866ab77E8085918e
ValidationRegistry — 0x148336926e6F21A2EC63B47BA31dD0B08E538b91
ReputationRegistry — 0x6f4065ca2c34a7201aa02d3b0c8b37be77d607b5

Try the app:

Google Play — production build
369wallet.xyz — site + iOS TestFlight signup

This is what we mean when we say a wallet for humans and agents. Both sides of that surface have to be auditable, by the user, on infrastructure neither side controls. Anything less is a chat box pretending to be a guarantee.

— The 369wallet team