Why ANML?
The agentic web is emerging fast. Multiple protocols and formats are being developed to enable autonomous agents to interact with services and with each other. This is our review of where ANML fits in that landscape.
Every protocol below solves real problems well. ANML doesn't replace any of them. It addresses a specific gap that none of them were designed to fill: giving agents the semantic comprehension layer they need to act intelligently within any protocol's mechanical contract.
The gap ANML fills
When an autonomous agent encounters a service today, it typically gets one of two things:
- An API contract (OpenAPI, GraphQL, JSON Schema) — tells the agent what endpoints exist and what data to send, but nothing about what the service means, how to behave, what's sensitive, or where the user is in a workflow.
- An HTML page — rich with visual information for humans, but requires expensive inference to extract intent, actions, and constraints that were never explicitly encoded.
Neither gives an agent what it actually needs: a structured, machine-readable document that conveys meaning, context, behavioral guidance, privacy rules, and workflow state — all in one place, without inference.
That's what ANML provides. It's a comprehension layer — a universal document format that any conforming agent can read to understand a service without bespoke integration.
Detailed comparison
| Capability | ANML | A2A | MCP | OpenAPI | HTML |
|---|---|---|---|---|---|
| Structured content for machines | ● | ○ | ◐ | ○ | ○ |
| Declared actions/endpoints | ● | ○ | ● | ● | ○ |
| Multi-step workflow state | ● | ◐ | ○ | ○ | ○ |
| Privacy/disclosure constraints | ● | ○ | ○ | ○ | ○ |
| Behavioral/persona guidance | ● | ○ | ◐ | ○ | ○ |
| Bidirectional knowledge exchange | ● | ◐ | ◐ | ○ | ○ |
| Usage rights (train/cache/display) | ● | ○ | ○ | ○ | ○ |
| Agent-to-agent task delegation | ○ | ● | ○ | ○ | ○ |
| Tool/function definitions | ○ | ○ | ● | ● | ○ |
| Transport protocol | ○ | ● | ● | ◐ | ○ |
| Capability negotiation | ○ | ● | ● | ○ | ○ |
| Visual rendering | ○ | ○ | ○ | ○ | ● |
| Media with semantic descriptions | ● | ○ | ○ | ○ | ◐ |
| No prior integration required | ● | ○ | ○ | ○ | ● |
| Trust/authorization delegation | ● | ○ | ○ | ○ | ○ |
● = yes ◐ = partial ○ = no
Protocol-by-protocol analysis
ANML vs. A2A (Agent-to-Agent Protocol)
What A2A does well
A2A is a transport and task-delegation protocol. It defines how agents discover each other (Agent Cards), negotiate capabilities, delegate tasks, stream progress updates, and return results. It handles the mechanics of multi-agent coordination: task lifecycle (submitted → working → completed), message passing, and capability advertisement.
What A2A doesn't address
- What the content of a task means semantically
- How an agent should behave when representing a service to a user
- What data requires user consent before sharing
- Where the user is in a multi-step workflow (beyond task status)
- What knowledge the service wants to teach the agent
- Usage rights on content (can it be cached? trained on?)
How they work together
A2A handles the plumbing of agent coordination. ANML provides the semantic payload that rides inside or alongside A2A task exchanges. An A2A Agent Card says “I can do shopping” — an ANML duckument says “here's the checkout flow, here's what consent I need, here's how to talk to the user about it.” They operate at different layers and don't conflict.
ANML vs. MCP (Model Context Protocol)
What MCP does well
MCP provides a standardized way for LLM applications to connect to external tools and data sources. It defines tool schemas (function signatures with typed parameters), resource access patterns, and prompt templates. It's excellent for giving an LLM structured access to APIs, databases, and file systems within a controlled context.
What MCP doesn't address
- Multi-step workflow state and progression
- Privacy constraints and disclosure rules
- Service identity and trust delegation
- Content usage rights
- Rich knowledge exchange (inform/ask patterns with TTLs and confidentiality)
- Persona and behavioral guidance beyond prompt templates
How they work together
MCP is a local integration protocol — it connects an LLM to tools within a trusted environment. ANML is a document format for the open web — it describes services that agents discover and interact with over HTTP. An MCP server could serve ANML documents as resources, or an agent could use MCP tools to fetch and process ANML duckuments from the web. They solve different problems at different trust boundaries.
ANML vs. OpenAPI / GraphQL
What OpenAPI does well
OpenAPI (and GraphQL schemas) provide precise, machine-readable descriptions of API endpoints: paths, methods, request/response schemas, authentication requirements, and error codes. They're the gold standard for API documentation and client generation.
What OpenAPI doesn't address
- Semantic meaning of the service (what it is, not just what it does)
- Workflow state — where the user is in a multi-step process
- Privacy and disclosure rules
- Behavioral guidance for how to present information to users
- Knowledge the service wants to proactively share with agents
- Content usage rights
- Media descriptions for non-visual consumers
The key difference
OpenAPI requires prior integration. Each service defines a bespoke contract that agents must be individually programmed to use. ANML requires no prior integration — like HTML for browsers, any conforming agent can read any ANML duckument and understand the service without service-specific code. OpenAPI tells youhow to call an API. ANML tells you why you'd want to, what it means, and how to behave while doing it.
ANML vs. UCP (Universal Commerce Protocol)
What UCP does well
UCP defines the wire-level contract for agentic commerce: checkout sessions, payment handlers, fulfillment, capability negotiation, and order management. It's a domain-specific protocol that standardizes how agents interact with merchants for buying and selling.
What UCP doesn't address
- Semantic framing — what the service is beyond its API contract
- Behavioral guidance for how agents should present options to users
- Privacy governance — which fields require explicit consent
- Proactive knowledge sharing (policies, deals, accessibility info)
- Non-commerce domains (UCP is commerce-specific; ANML is domain-neutral)
How they work together
UCP and ANML are natural complements. UCP defines the mechanical contract (endpoints, schemas, payment flows). ANML wraps that contract with semantic context: workflow visibility, disclosure rules, persona guidance, and knowledge that helps agents act intelligently within the UCP interaction. A merchant publishes both — UCP for the protocol, ANML for the comprehension.
ANML vs. HTML
What HTML does well
HTML is the foundation of the human web. It produces rich, accessible, visual interfaces that billions of people use daily. Combined with CSS and JavaScript, it enables complex interactive experiences.
Why HTML doesn't work for agents
- Intent is implicit — encoded in visual layout, button labels, and JavaScript behavior
- Extracting structured actions requires expensive inference over unstructured markup
- No machine-readable privacy or disclosure rules
- No explicit workflow state
- Media (images, video) requires vision inference with no guarantee of semantic return
- Navigation, ads, and layout markup consume token budget with zero value to agents
The relationship
ANML is to agents what HTML is to browsers. HTML provides visual interfaces for human consumption. ANML provides semantic interfaces for machine consumption. They coexist — a service publishes HTML for humans and ANML for agents. ANML doesn't replace HTML; it complements it for a different class of consumer.
What only ANML provides
These capabilities are not addressed by any of the protocols above, individually or in combination:
Privacy constraints as first-class data
Machine-readable disclosure rules that agents must evaluate before sharing any user data. Not an afterthought — it's a required processing step.
Behavioral persona guidance
Services can advise agents on tone, language, vocabulary, and presentation style. The agent represents the service to the user with appropriate context.
Bidirectional knowledge exchange
Services inform agents (with TTLs and confidentiality levels) and ask agents for data (with stated purposes). Neither party is obligated to comply.
Content usage rights
A machine-readable hierarchy (none < display < cache < store < train) that governs what agents may do with content. Critical for the AI training era.
No prior integration
Like HTML, any conforming agent can read any ANML document without service-specific code. No SDK, no client generation, no per-service programming.
Media with semantic descriptions
Images, audio, and video include text descriptions and transcripts so agents understand media content without performing vision or audio inference.
Trust delegation
A DNS-bootstrapped mechanism for verifying that a serving domain is authorized to assert another site's identity — solving the CDN/third-party serving problem.
Domain neutrality
Not tied to commerce, healthcare, or any vertical. The same format works for any agent-to-service interaction over HTTP.
When ANML is not the right choice
Being honest about limitations:
- You need agent-to-agent task delegation— use A2A. ANML doesn't define how agents discover, negotiate with, or delegate work to each other.
- You need to connect an LLM to local tools — use MCP. ANML is for the open web, not for wiring up local databases and file systems to a model.
- You need precise API documentation for developers — use OpenAPI. ANML is for agent comprehension, not for generating client libraries or API reference docs.
- You need visual interfaces for humans — use HTML. ANML has no rendering semantics and is not designed for human consumption.
- You need real-time streaming communication— use WebSockets, SSE, or A2A's streaming task updates. ANML is a document format, not a streaming protocol.
The next generation of internet access
The graphical web is dying. Not tomorrow — but the trajectory is clear. The primary consumers of web content are shifting from humans with browsers to agents acting on behalf of humans. People don't want to browse ten hotel sites, compare prices, read cancellation policies, and fill out forms. They want to say “book me a hotel in Austin for next weekend under $200” and have it done.
This isn't speculation. Google just rebuilt their entire product line around agentic AI. 900 million people are already using Gemini. Agents are shopping, booking, researching, and transacting — and the volume is only going up. Within a few years, the majority of commercial web interactions won't involve a human looking at a screen.
When that happens, the graphical web becomes a legacy interface. HTML, CSS, and JavaScript — the entire visual stack — becomes overhead. Agents don't need hero images, navigation menus, cookie banners, or responsive layouts. They need structured meaning: what is this service, what can I do here, what are the rules, and how do I act on behalf of my user.
ANML is the native format for this post-graphical web.It's what services publish when their primary audience is agents, not eyeballs. It's the HTTP response that agents actually want — not a 2MB HTML page full of ads and tracking scripts, but a clean, structured duckument that says exactly what the service offers and how to use it.
The inflection point
At Google I/O 2026, Sundar Pichai declared the “agentic Gemini era” — unveiling Gemini Spark (a 24/7 personal agent), agentic commerce in Search, and an entire product lineup rebuilt around autonomous AI that acts on behalf of users. Google processes 3.2 quadrillion tokens monthly. Over 900 million people use the Gemini app. These agents are already browsing, shopping, and transacting across the web.
Anthropic's Claude can now take over your desktop — clicking, typing, navigating apps, completing multi-step tasks while you walk away. Their Computer Use and Dispatch features turn Claude into an autonomous operator that interacts with services the same way a human would: by looking at screens and inferring what to do.
OpenAI, Microsoft, Apple, and dozens of startups are building the same thing. Autonomous agents that act on the web on behalf of users. The question isn't whether this is happening. It's whether the web is ready for it.
Everyone is building their own rails
Google's agentic commerce stack is the most advanced system heading in ANML's direction — but it's built on Google's proprietary infrastructure. Merchants integrate with Google's systems. Users use Google's agents. Google controls the rails.
Anthropic's approach is different but equally closed: Claude infers everything from screen pixels. Every interaction costs inference, every layout change is a potential failure, and there's no way for services to communicate intent, constraints, or context to the agent directly.
OpenAI is taking yet another approach with their Apps SDK and UI guidelines — defining how third-party apps render inside ChatGPT. It's a walled garden with a nice developer experience, but it only works within OpenAI's ecosystem. Your app renders in ChatGPT. It doesn't work in Gemini, Claude, or any other agent.
The common thread: every major platform is solving the agent-to-service problem, but each is solving it only for their own agents, their own users, their own ecosystem. None of them solve cross-platform interoperability. None of them give services a way to communicate meaning to any agent from any vendor.
And critically — none of them solve the problem of agents extracting meaning from external, non-trained content. An agent encountering a new service it has never seen before has no way to understand it without either expensive inference over HTML or a pre-built integration. ANML solves this: publish a duckument, and any conforming agent understands your service immediately.
| Need | Google Agentic | Claude Computer Use | OpenAI Apps SDK | ANML |
|---|---|---|---|---|
| Open standard | ○ | ○ | ○ | ● |
| Works with any agent | ○ | ◐ | ○ | ● |
| No inference required | ◐ | ○ | ● | ● |
| Service controls its narrative | ◐ | ○ | ◐ | ● |
| Privacy rules are explicit | ○ | ○ | ○ | ● |
| No vendor lock-in | ○ | ◐ | ○ | ● |
| Understands new/untrained content | ○ | ◐ | ○ | ● |
The business case
For services and merchants
Without ANML, your service is at the mercy of whatever an agent infers from your HTML. You can't control how your brand is represented. You can't communicate policies. You can't enforce consent requirements. You can't guide the agent's behavior when it represents you to a user.
With ANML, you publish a duckument that tells every agent — regardless of vendor — exactly what your service offers, how to interact with it, what requires consent, and how to represent you. One document, every agent, no per-platform integration.
For agent platforms
Without ANML, every service interaction requires expensive inference. Your agents burn tokens parsing HTML, guessing at workflows, and hoping the layout hasn't changed since last time. Reliability is low. Cost is high. User trust erodes with every failed interaction.
With ANML, your agents read structured documents that explicitly declare everything they need. Deterministic. Cheap. Reliable. And you don't need to build custom integrations for every service on the web.
For users
Without ANML, agents share your data without understanding consent requirements. They misrepresent services. They fail silently when workflows change. They can't tell you where you are in a process or what happens next.
With ANML, your agent knows what requires your permission before sharing. It understands the full workflow. It represents services accurately because the service told it how to. Privacy is enforced by design, not by hope.
The next protocol transition
The internet has been through this before. In the early 1990s, information was locked in proprietary systems — Gopher, WAIS, FTP directories. HTML created a universal format that any browser could render, and the graphical web exploded. The protocol transition from proprietary to open unlocked the entire modern internet economy.
We're at the same transition point. The graphical web (HTML) served the browser era. The agentic web needs its own native format. Right now, agents are forced to consume content designed for human eyes — the equivalent of trying to run a modern web app on a Gopher client. It works if you squint, but it's not what the medium was built for.
Agent-to-agent and agent-to-service communication will become the dominant form of internet traffic. Not because humans stop using the internet, but because humans delegate to agents. When you ask your agent to “find me the best flight to Tokyo next month,” that agent will hit dozens of services, compare options, check constraints, and come back with an answer. Every one of those interactions is agent-to-service. None of them need a graphical interface.
ANML is the format those interactions deserve — structured, explicit, privacy-aware, and universal. Not a hack on top of HTML. Not a proprietary SDK locked to one platform. An open standard for the post-graphical internet.
The window is now
Standards get adopted in the early days or not at all. Once closed platforms establish their proprietary rails, the cost of switching to an open standard becomes prohibitive. The time to establish an open comprehension layer for the agentic web is before the closed alternatives become entrenched — not after.