Agentic Commerce: How to Let AI Agents Buy Things Safely
How to design governed AI agent-initiated commerce. Policy engines, HITL approval gates, HMAC receipts, idempotency, tenant scoping, and the full Agentic Checkout Protocol.
Why This Matters Now
AI agents are getting good at conversations, recommendations, and search. The next step is obvious: let them complete transactions. An agent that can check availability, compare options, and book tickets is more useful than one that just tells you the options and makes you click through a checkout form.
But the moment an agent can spend money, everything changes. A customer support agent that books the wrong show. A travel agent that reserves 50 seats instead of 5. A shopping assistant that places an order for a product the supplier no longer carries. These are not hypothetical scenarios. They are the predictable result of connecting an LLM to a checkout API without governance.
We built a protocol for this. We call it the Agentic Checkout Protocol (ACP). It governs how AI agents initiate, validate, and complete commerce transactions within a multi-tenant platform. This article explains the architecture from the ground up.
For context on how we approach agentic AI systems and multi-agent architecture, those guides cover the broader patterns. This article focuses specifically on the commerce transaction problem.
The Core Problem
Traditional checkout flows assume a human is making the decision. The merchant's backend calls the checkout API with an API key, the order is validated, payment is captured. The human is accountable.
With agentic commerce, the caller is an AI agent. The agent operates within a conversation thread, has access to tools, and can take actions autonomously. This introduces risks that traditional checkout doesn't handle:
| Risk | Example | Traditional Checkout | Agentic Commerce |
|---|---|---|---|
| Unauthorized orders | Agent books from a supplier the merchant hasn't approved | API key maps to tenant, all suppliers visible | Need per-agent supplier restrictions |
| Spending overrun | Agent places a 10,000 EUR order without approval | No spending limits (merchant is the human) | Need value caps and approval gates |
| Hallucinated bookings | Agent fabricates product details that bypass validation | Not possible (human selects real products) | Agent could pass invented product IDs |
| Duplicate orders | Network retry causes double booking | Handled at API level (maybe) | Need idempotency at protocol level |
| No audit trail | Who authorized this order? Which agent? Which conversation? | Merchant is accountable | Need per-action audit with agent identity |
| Bulk fraud | Agent creates 100 orders in a loop | Rate limiting | Need per-order item limits and velocity checks |
The solution is not "don't let agents buy things." The solution is a protocol that governs every step.
The Agentic Checkout Protocol
ACP composes five systems into a single trust framework:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ACP FLOW β
β β
β User: "Book The Lion King for 2 adults on Saturday" β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββ β
β β 1. MCP TOOL LAYER β β
β β Agent calls createCheckout() β β
β β via Model Context Protocol β β
β ββββββββββββββββ¬βββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββ β
β β 2. POLICY ENGINE β β
β β Check: Is this agent allowed β β
β β to book from this supplier? β β
β β Is the value within limits? β β
β β Result: ALLOW or DENY β β
β ββββββββββββββββ¬βββββββββββββββββββ β
β β β
β ALLOWβ DENY βββΆ Error to agent β
β βΌ β
β βββββββββββββββββββββββββββββββββββ β
β β 3. HITL APPROVAL GATE β β
β β If order value > threshold: β β
β β Suspend workflow, notify ops β β
β β Wait for human approval β β
β ββββββββββββββββ¬βββββββββββββββββββ β
β β β
β APPROVED β
β βΌ β
β βββββββββββββββββββββββββββββββββββ β
β β 4. SUPPLIER EXECUTION β β
β β Reserve inventory β β
β β Confirm booking via supplier β β
β β Generate HMAC receipt β β
β ββββββββββββββββ¬βββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββ β
β β 5. AUDIT LOG β β
β β Write immutable record β β
β β actor_type: "agent" β β
β β thread_id, tenant_id, receipt β β
β βββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Each component is independently valuable. Together, they form a complete governance framework for AI-initiated commerce.
Component 1: MCP Tool Layer
The agent doesn't call APIs directly. It uses MCP (Model Context Protocol) tools that are registered with the agent framework. Each tool is a controlled entry point with input validation, policy enforcement, and audit logging.
// MCP tool definition for creating a checkout
const createCheckoutTool = createTool({
id: 'createCheckout',
description: 'Create a checkout for a product booking',
inputSchema: z.object({
productId: z.string(),
date: z.string().datetime(),
persons: z.array(z.object({
type: z.enum(['adult', 'child', 'senior']),
count: z.number().int().positive(),
})),
}),
execute: async ({ productId, date, persons }, ctx) => {
// 1. Validate product exists in search index
const product = await searchApi.getProduct(ctx.tenantId, productId);
if (!product) throw new ToolError('Product not found');
// 2. Check policy
const policy = await policyEngine.evaluate(
ctx.tenantId, ctx.channelId, 'create_order',
{ productId, supplierId: product.supplierId, estimatedValue: product.price }
);
if (!policy.allowed) throw new ToolError(`Policy denied: ${policy.reason}`);
// 3. Start checkout workflow
return checkoutWorkflow.start({ productId, date, persons, ctx });
},
});
The tool validates that the product actually exists (prevents hallucinated bookings), checks policy before any financial action, and routes to a governed workflow. The agent never has direct access to the supplier API.
This pattern comes from how we design AI workflow systems. The tool is the trust boundary. Everything inside the tool is governed. Everything outside is just conversation.
Component 2: Policy Engine
Before any financial action, the policy engine evaluates tenant-specific rules. Rules are stored per tenant and cached with a 5-minute TTL.
interface PolicyRule {
action: string; // "create_order", "cancel_order", "search"
effect: "allow" | "deny";
conditions: {
max_order_value?: number; // Hard cap, reject above this
allowed_suppliers?: string[]; // Restrict which suppliers
require_human_approval_above?: number; // HITL threshold
max_items_per_order?: number; // Prevent bulk fraud
};
}
Evaluation Rules
- Deny rules take precedence. If any deny rule matches, the action is rejected immediately.
- Allow rules with conditions. If an allow rule matches, its conditions are evaluated against the request.
- No matching rule = denied. Default-deny. An agent cannot do anything that isn't explicitly allowed.
Example Policy
{
"tenant_id": "tenant_acme",
"rules": [
{
"action": "create_order",
"effect": "allow",
"conditions": {
"max_order_value": 5000,
"allowed_suppliers": ["supplier_alpha", "supplier_beta"],
"require_human_approval_above": 500,
"max_items_per_order": 10
}
},
{
"action": "cancel_order",
"effect": "allow",
"conditions": {}
},
{
"action": "search",
"effect": "allow",
"conditions": {}
}
]
}
This tenant's agents can search freely, create orders up to 5,000 EUR (but orders above 500 EUR need human approval), book only from two approved suppliers, and create orders with at most 10 items. They can cancel orders without restrictions. Anything else is denied.
The policy engine is tenant-scoped. Different merchants get different rules. A large enterprise might set the HITL threshold at 2,000 EUR. A small operator might require approval for everything above 100 EUR. The platform operator sets default policies, and tenants can customize within their allowed range.
Component 3: HITL Approval Gates
When an order exceeds the require_human_approval_above threshold, the checkout workflow suspends and waits for human approval.
// Inside the checkout workflow
async function checkoutWorkflow(ctx, params) {
// Step 1: Reserve inventory
const reservation = await supplierAdapter.reserveInventory(
ctx.tenantId, params.productId, params.date, params.persons
);
// Step 2: Check if human approval is needed
if (reservation.totalPrice > policy.require_human_approval_above) {
// Suspend workflow, notify ops dashboard
const approval = await workflow.suspend({
reason: 'Order exceeds approval threshold',
totalPrice: reservation.totalPrice,
threshold: policy.require_human_approval_above,
productName: reservation.productName,
reservationExpiresAt: reservation.expiresAt,
});
if (!approval.approved) {
await supplierAdapter.cancelReservation(reservation.id);
return { status: 'REJECTED', reason: approval.rejectionReason };
}
}
// Step 3: Confirm booking
const booking = await supplierAdapter.confirmBooking(reservation);
// Step 4: Generate receipt
const receipt = generateHmacReceipt(booking, ctx.tenantId);
// Step 5: Audit log
await auditLog.write({
action: 'order_completed',
actor_type: 'agent',
thread_id: ctx.threadId,
tenant_id: ctx.tenantId,
order_id: booking.orderId,
receipt_hmac: receipt.hmac,
});
return { status: 'COMPLETED', orderId: booking.orderId, receiptHmac: receipt.hmac };
}
The workflow uses suspend() and resume() from the agent framework. When suspended, the ops dashboard shows the pending order with all details. An operator can approve or reject with a reason. The workflow resumes automatically when the decision is made.
The reservation has an expiration time. If the human doesn't approve before the reservation expires, the system automatically cancels the reservation and informs the agent. The agent can then tell the user that the booking window has closed.
For more on human oversight in AI systems, see our guide on human-in-the-loop AI.
Component 4: HMAC Receipts
Every completed order generates a tamper-evident receipt using HMAC-SHA256. This provides cryptographic proof of what happened.
Receipt Specification
| Parameter | Value |
|---|---|
| Algorithm | HMAC-SHA256 |
| Key storage | Per-tenant secret (rotated annually) |
| Output format | Hex-encoded string |
| Canonicalization | JSON.stringify(payload, Object.keys(payload).sort()) |
Receipt Payload
interface ReceiptPayload {
order_id: string;
tenant_id: string;
products: Array<{
id: string;
date: string;
total_price: number;
}>;
total_price: number;
currency: string;
booker_email: string;
created_at: string; // ISO 8601
}
Signing
function generateHmacReceipt(order: Order, tenantId: string): Receipt {
const payload: ReceiptPayload = {
order_id: order.id,
tenant_id: tenantId,
products: order.items.map(item => ({
id: item.productId,
date: item.date,
total_price: item.totalPrice,
})),
total_price: order.totalPrice,
currency: order.currency,
booker_email: order.bookerEmail,
created_at: order.createdAt.toISOString(),
};
// Canonical JSON: sorted keys for deterministic output
const canonical = JSON.stringify(payload, Object.keys(payload).sort());
const secret = await secretsManager.getTenantSecret(tenantId);
const hmac = crypto.createHmac('sha256', secret).update(canonical).digest('hex');
return { payload, hmac, signedAt: new Date().toISOString() };
}
Verification
GET /v1/order/{orderId}/verify
Response: { valid: true, signed_at: "2026-06-15T14:30:00Z" }
The verification endpoint recomputes the HMAC from the stored order data using the tenant's secret and compares it to the stored receipt. A mismatch means the order data was tampered with after signing. This gives merchants, auditors, and regulators verifiable evidence that a specific agent, acting within a specific tenant scope, created a specific order at a specific time.
Component 5: Immutable Audit Trail
Every ACP action generates an audit entry. The audit system uses a two-tier architecture:
Tier 1: Operational log (database, queryable, 90-day retention)
{
action: 'order_completed',
actor_type: 'agent', // "agent" | "human" | "system"
thread_id: 'thread_abc123', // which conversation
tenant_id: 'tenant_acme',
channel_id: 'ch_web_en',
order_id: 'ord_xyz789',
total_price: 450,
currency: 'EUR',
supplier_id: 'supplier_alpha',
receipt_hmac: 'a1b2c3...',
policy_evaluated: true,
hitl_required: false,
created_at: '2026-06-15T14:30:00Z',
}
Tier 2: Immutable archive (object storage with write-once/read-many, 7-year retention)
The operational log streams to object storage with object locks (compliance mode). Once written, entries cannot be modified or deleted for the retention period. This satisfies regulatory requirements for financial transaction records.
Every audit entry includes actor_type and thread_id. This makes it possible to trace exactly which AI agent, in which conversation, initiated which financial action. Combined with the HMAC receipt, the audit trail provides end-to-end proof of the decision chain.
For how we build audit and observability systems more broadly, see our guides on AI governance and AI observability.
ACP vs Traditional Checkout
| Capability | Traditional API Checkout | ACP Adds |
|---|---|---|
| Caller | Merchant backend (API key / JWT) | AI agent via MCP tool |
| Policy checks | Implicit (API key maps to tenant) | Explicit per-action evaluation with deny rules |
| Spending controls | None (merchant is the human) | max_order_value, require_human_approval_above |
| Supplier restrictions | Channel visibility only | Per-agent allowed_suppliers policy rule |
| Human approval | N/A (merchant is the human) | HITL gate via workflow suspend/resume |
| Audit trail | Standard API logs | Per-action audit with actor_type: "agent", thread_id |
| Fraud prevention | Rate limiting, schema validation | Per-order item limits, supplier allow-lists, value caps |
| Tamper evidence | None | HMAC-SHA256 receipts with per-tenant signing keys |
Idempotency
Agent-initiated transactions need idempotency at the protocol level. Network retries, agent re-attempts, and workflow replays must not create duplicate orders.
// Idempotency key for agent-initiated checkouts
const idempotencyKey = `${tenantId}:${threadId}:${productId}:${date}`;
// Check before processing
const existing = await idempotencyStore.get(idempotencyKey);
if (existing) {
return existing.result; // Return cached result
}
// Process and store
await idempotencyStore.acquire(idempotencyKey);
const result = await processCheckout(params);
await idempotencyStore.complete(idempotencyKey, result);
return result;
The idempotency key combines tenant, conversation thread, product, and date. The same agent in the same conversation booking the same product on the same date always returns the same result. A different conversation thread gets a fresh key.
The store uses conditional writes to prevent race conditions. If two workers try to acquire the same key simultaneously, only one succeeds.
Tenant Scoping
Every ACP operation is scoped to a tenant and channel. The identity hierarchy ensures data isolation at every layer:
Tenant (merchant organization)
βββ Channel (storefront or sales channel)
βββ Customer (end user)
βββ Session (browser/device session)
βββ Agent Thread (single conversation)
The agent thread inherits its tenant and channel context from the authenticated session. Every tool call, every policy evaluation, every audit entry carries these identifiers. Cross-tenant data access is architecturally impossible because every query includes the tenant scope as a mandatory filter.
This is the same multi-tenant isolation pattern we describe in our system architecture guide. The difference in ACP is that the agent adds another level of scoping: the thread. An agent's memory and context are scoped to tenant_id + session_id + thread_id. Agent A in one tenant cannot see agent B's conversation in another tenant.
What Can Go Wrong
Even with ACP, there are failure modes to handle:
| Failure | What Happens | ACP Response |
|---|---|---|
| Supplier API down | Booking can't be confirmed | Retry with backoff, inform agent of failure |
| Reservation expired | HITL approval took too long | Cancel reservation, agent informs user |
| Price changed between search and checkout | Agent quoted wrong price | Re-validate price at checkout time, reject if delta > threshold |
| Agent hallucinates product ID | Product doesn't exist in search index | Tool validates product existence before policy check |
| Concurrent bookings exhaust inventory | Two agents book last seat simultaneously | Conditional write on reservation, loser gets inventory error |
| HMAC secret rotation during booking | Old secret signs, new secret verifies | Keep previous secret for 24h after rotation for verification |
Implementation Considerations
When to Introduce ACP
Not every AI commerce integration needs the full protocol. Consider the level of autonomy:
| Level | Description | Governance Needed |
|---|---|---|
| Search only | Agent searches products, shows results | Policy on search (supplier visibility) |
| Recommendations | Agent suggests products based on preferences | Same as search |
| Cart building | Agent adds items to cart, human completes checkout | Minimal, human is the final gate |
| Assisted checkout | Agent initiates checkout, human confirms payment | HITL on every order |
| Autonomous checkout | Agent books and pays without human interaction | Full ACP |
Start with search only. Add cart building when trust is established. Move to assisted checkout with HITL on every order. Graduate to autonomous checkout with policy-based HITL thresholds only after you have data on the agent's accuracy and the error rate is acceptable.
Technology Choices
ACP is protocol-level, not implementation-specific. The components can be built with different technologies:
| Component | Our Implementation | Alternatives |
|---|---|---|
| MCP tools | Mastra createTool() | LangChain tools, custom tool server |
| Policy engine | DynamoDB with 5-min cache | PostgreSQL, Redis, OPA (Open Policy Agent) |
| HITL gates | Mastra workflow.suspend() | Custom queue + webhook, Temporal, Inngest |
| HMAC signing | Node.js crypto.createHmac() | Any language with HMAC-SHA256 support |
| Audit log | DynamoDB Streams to S3 (Object Lock) | PostgreSQL + WAL shipping, event store |
| Idempotency | DynamoDB conditional writes | PostgreSQL advisory locks, Redis SET NX |
The protocol design matters more than the specific tools. If you implement the five components (tool governance, policy evaluation, human approval gates, tamper-evident receipts, and immutable audit), the system is safe regardless of the underlying technology.
For our broader perspective on AI systems architecture and how we approach consulting engagements, those pages provide more context.
Common Pitfalls
-
Letting agents call supplier APIs directly. The MCP tool layer is the trust boundary. Without it, you have no policy enforcement, no audit trail, and no idempotency.
-
Default-allow policy. ACP uses default-deny. If no explicit allow rule exists for an action, it's rejected. Default-allow with deny overrides is weaker because you have to anticipate every bad action in advance.
-
Skipping HMAC receipts. Without tamper-evident receipts, you can't prove to a merchant or regulator that an order wasn't modified after the fact.
-
Same idempotency key format for all operations. The key must include context (thread, product, date). A global key like
order:123doesn't prevent a different agent in a different conversation from booking the same product. -
No reservation expiration. If HITL approval takes forever, the reservation holds inventory indefinitely. Always set an expiration and handle the timeout case.
-
Trusting agent-provided product data. The tool must validate the product exists in the search index. The agent might hallucinate a product ID, a price, or an availability date.
-
Not scoping agent memory per tenant. An agent serving tenant A must not have access to tenant B's conversation history, product catalog, or policy rules.
-
Treating agentic checkout as a feature, not a protocol. ACP is not a feature you bolt on. It's a trust framework that must be designed into the system from the start.
Key Takeaways
-
Agents cannot "just book things." Every agent-initiated financial transaction needs policy evaluation, optional human approval, tamper-evident receipts, and an immutable audit trail. This is not optional for production commerce.
-
The MCP tool is the trust boundary. The agent never has direct API access. Every action goes through a governed tool that validates, checks policy, and logs.
-
Default-deny policy is non-negotiable. If no explicit allow rule exists, the action is rejected. Deny rules always override allow rules.
-
HITL gates are configurable, not binary. Small orders go through automatically. Large orders wait for human approval. The threshold is per-tenant, per-action.
-
HMAC receipts prove what happened. Per-tenant signing keys, canonical JSON serialization, hex-encoded HMAC-SHA256. Verifiable by any party with the secret.
-
Idempotency prevents duplicate orders. The key must include conversation context (thread ID), not just the order data.
-
Start with search, graduate to checkout. Don't give agents autonomous buying power on day one. Build trust incrementally with data on accuracy and error rates.
Agentic commerce is inevitable. The question is whether you build the governance first or clean up the damage after. We built ACP because we needed it for a real multi-tenant commerce platform with multiple suppliers and AI-powered booking agents. The protocol works. The architecture is production-proven.
If you're building AI-powered commerce and need help designing the governance layer, talk to our team or request a quote. You can also explore our AI services and our ecommerce architecture practice for more context.
Topics covered
Related Guides
Enterprise Guide to Agentic AI Systems
Technical guide to agentic AI systems in enterprise environments. Learn the architecture, capabilities, and applications of autonomous AI agents.
Read guideThe 9 Places Your AI System Leaks Data (and How to Seal Each One)
A systematic map of every place data leaks in AI systems. Prompts, embeddings, logs, tool calls, agent memory, error messages, cache, fine-tuning data, and agent handoffs.
Read guideAI Decisions You Can Defend: Auditability, Traceability, and Proof in Production
How to build AI systems with full decision traceability. Structured audit events, HMAC receipts, session-scoped decision chains, human approval records, and retention architecture.
Read guideReady to build production AI systems?
Our team specializes in building production-ready AI systems. Let's discuss how we can help transform your enterprise with cutting-edge technology.
Start a conversation