Technical Guide

Agentic Commerce: How to Let AI Agents Buy Things Safely

How to design governed AI agent-initiated commerce. Policy engines, HITL approval gates, HMAC receipts, idempotency, tenant scoping, and the full Agentic Checkout Protocol.

March 10, 202620 min readOronts Engineering Team

Why This Matters Now

AI agents are getting good at conversations, recommendations, and search. The next step is obvious: let them complete transactions. An agent that can check availability, compare options, and book tickets is more useful than one that just tells you the options and makes you click through a checkout form.

But the moment an agent can spend money, everything changes. A customer support agent that books the wrong show. A travel agent that reserves 50 seats instead of 5. A shopping assistant that places an order for a product the supplier no longer carries. These are not hypothetical scenarios. They are the predictable result of connecting an LLM to a checkout API without governance.

We built a protocol for this. We call it the Agentic Checkout Protocol (ACP). It governs how AI agents initiate, validate, and complete commerce transactions within a multi-tenant platform. This article explains the architecture from the ground up.

For context on how we approach agentic AI systems and multi-agent architecture, those guides cover the broader patterns. This article focuses specifically on the commerce transaction problem.

The Core Problem

Traditional checkout flows assume a human is making the decision. The merchant's backend calls the checkout API with an API key, the order is validated, payment is captured. The human is accountable.

With agentic commerce, the caller is an AI agent. The agent operates within a conversation thread, has access to tools, and can take actions autonomously. This introduces risks that traditional checkout doesn't handle:

Risk	Example	Traditional Checkout	Agentic Commerce
Unauthorized orders	Agent books from a supplier the merchant hasn't approved	API key maps to tenant, all suppliers visible	Need per-agent supplier restrictions
Spending overrun	Agent places a 10,000 EUR order without approval	No spending limits (merchant is the human)	Need value caps and approval gates
Hallucinated bookings	Agent fabricates product details that bypass validation	Not possible (human selects real products)	Agent could pass invented product IDs
Duplicate orders	Network retry causes double booking	Handled at API level (maybe)	Need idempotency at protocol level
No audit trail	Who authorized this order? Which agent? Which conversation?	Merchant is accountable	Need per-action audit with agent identity
Bulk fraud	Agent creates 100 orders in a loop	Rate limiting	Need per-order item limits and velocity checks

The solution is not "don't let agents buy things." The solution is a protocol that governs every step.

The Agentic Checkout Protocol

ACP composes five systems into a single trust framework:

┌─────────────────────────────────────────────────────────┐
│                   ACP FLOW                               │
│                                                          │
│  User: "Book The Lion King for 2 adults on Saturday"     │
│                         │                                │
│                         ▼                                │
│  ┌─────────────────────────────────┐                    │
│  │  1. MCP TOOL LAYER              │                    │
│  │  Agent calls createCheckout()   │                    │
│  │  via Model Context Protocol     │                    │
│  └──────────────┬──────────────────┘                    │
│                 │                                        │
│                 ▼                                        │
│  ┌─────────────────────────────────┐                    │
│  │  2. POLICY ENGINE               │                    │
│  │  Check: Is this agent allowed   │                    │
│  │  to book from this supplier?    │                    │
│  │  Is the value within limits?    │                    │
│  │  Result: ALLOW or DENY          │                    │
│  └──────────────┬──────────────────┘                    │
│                 │                                        │
│            ALLOW│         DENY ──▶ Error to agent       │
│                 ▼                                        │
│  ┌─────────────────────────────────┐                    │
│  │  3. HITL APPROVAL GATE          │                    │
│  │  If order value > threshold:    │                    │
│  │  Suspend workflow, notify ops   │                    │
│  │  Wait for human approval        │                    │
│  └──────────────┬──────────────────┘                    │
│                 │                                        │
│              APPROVED                                    │
│                 ▼                                        │
│  ┌─────────────────────────────────┐                    │
│  │  4. SUPPLIER EXECUTION          │                    │
│  │  Reserve inventory              │                    │
│  │  Confirm booking via supplier   │                    │
│  │  Generate HMAC receipt          │                    │
│  └──────────────┬──────────────────┘                    │
│                 │                                        │
│                 ▼                                        │
│  ┌─────────────────────────────────┐                    │
│  │  5. AUDIT LOG                   │                    │
│  │  Write immutable record         │                    │
│  │  actor_type: "agent"            │                    │
│  │  thread_id, tenant_id, receipt  │                    │
│  └─────────────────────────────────┘                    │
│                                                          │
└─────────────────────────────────────────────────────────┘

Each component is independently valuable. Together, they form a complete governance framework for AI-initiated commerce.

Component 1: MCP Tool Layer

The agent doesn't call APIs directly. It uses MCP (Model Context Protocol) tools that are registered with the agent framework. Each tool is a controlled entry point with input validation, policy enforcement, and audit logging.

// MCP tool definition for creating a checkout
const createCheckoutTool = createTool({
    id: 'createCheckout',
    description: 'Create a checkout for a product booking',
    inputSchema: z.object({
        productId: z.string(),
        date: z.string().datetime(),
        persons: z.array(z.object({
            type: z.enum(['adult', 'child', 'senior']),
            count: z.number().int().positive(),
        })),
    }),
    execute: async ({ productId, date, persons }, ctx) => {
        // 1. Validate product exists in search index
        const product = await searchApi.getProduct(ctx.tenantId, productId);
        if (!product) throw new ToolError('Product not found');

        // 2. Check policy
        const policy = await policyEngine.evaluate(
            ctx.tenantId, ctx.channelId, 'create_order',
            { productId, supplierId: product.supplierId, estimatedValue: product.price }
        );
        if (!policy.allowed) throw new ToolError(`Policy denied: ${policy.reason}`);

        // 3. Start checkout workflow
        return checkoutWorkflow.start({ productId, date, persons, ctx });
    },
});

The tool validates that the product actually exists (prevents hallucinated bookings), checks policy before any financial action, and routes to a governed workflow. The agent never has direct access to the supplier API.

This pattern comes from how we design AI workflow systems. The tool is the trust boundary. Everything inside the tool is governed. Everything outside is just conversation.

Component 2: Policy Engine

Before any financial action, the policy engine evaluates tenant-specific rules. Rules are stored per tenant and cached with a 5-minute TTL.

interface PolicyRule {
    action: string;                    // "create_order", "cancel_order", "search"
    effect: "allow" | "deny";
    conditions: {
        max_order_value?: number;          // Hard cap, reject above this
        allowed_suppliers?: string[];      // Restrict which suppliers
        require_human_approval_above?: number;  // HITL threshold
        max_items_per_order?: number;      // Prevent bulk fraud
    };
}

Evaluation Rules

Deny rules take precedence. If any deny rule matches, the action is rejected immediately.
Allow rules with conditions. If an allow rule matches, its conditions are evaluated against the request.
No matching rule = denied. Default-deny. An agent cannot do anything that isn't explicitly allowed.

Example Policy

{
    "tenant_id": "tenant_acme",
    "rules": [
        {
            "action": "create_order",
            "effect": "allow",
            "conditions": {
                "max_order_value": 5000,
                "allowed_suppliers": ["supplier_alpha", "supplier_beta"],
                "require_human_approval_above": 500,
                "max_items_per_order": 10
            }
        },
        {
            "action": "cancel_order",
            "effect": "allow",
            "conditions": {}
        },
        {
            "action": "search",
            "effect": "allow",
            "conditions": {}
        }
    ]
}

This tenant's agents can search freely, create orders up to 5,000 EUR (but orders above 500 EUR need human approval), book only from two approved suppliers, and create orders with at most 10 items. They can cancel orders without restrictions. Anything else is denied.

The policy engine is tenant-scoped. Different merchants get different rules. A large enterprise might set the HITL threshold at 2,000 EUR. A small operator might require approval for everything above 100 EUR. The platform operator sets default policies, and tenants can customize within their allowed range.

Component 3: HITL Approval Gates

When an order exceeds the require_human_approval_above threshold, the checkout workflow suspends and waits for human approval.

// Inside the checkout workflow
async function checkoutWorkflow(ctx, params) {
    // Step 1: Reserve inventory
    const reservation = await supplierAdapter.reserveInventory(
        ctx.tenantId, params.productId, params.date, params.persons
    );

    // Step 2: Check if human approval is needed
    if (reservation.totalPrice > policy.require_human_approval_above) {
        // Suspend workflow, notify ops dashboard
        const approval = await workflow.suspend({
            reason: 'Order exceeds approval threshold',
            totalPrice: reservation.totalPrice,
            threshold: policy.require_human_approval_above,
            productName: reservation.productName,
            reservationExpiresAt: reservation.expiresAt,
        });

        if (!approval.approved) {
            await supplierAdapter.cancelReservation(reservation.id);
            return { status: 'REJECTED', reason: approval.rejectionReason };
        }
    }

    // Step 3: Confirm booking
    const booking = await supplierAdapter.confirmBooking(reservation);

    // Step 4: Generate receipt
    const receipt = generateHmacReceipt(booking, ctx.tenantId);

    // Step 5: Audit log
    await auditLog.write({
        action: 'order_completed',
        actor_type: 'agent',
        thread_id: ctx.threadId,
        tenant_id: ctx.tenantId,
        order_id: booking.orderId,
        receipt_hmac: receipt.hmac,
    });

    return { status: 'COMPLETED', orderId: booking.orderId, receiptHmac: receipt.hmac };
}

The workflow uses suspend() and resume() from the agent framework. When suspended, the ops dashboard shows the pending order with all details. An operator can approve or reject with a reason. The workflow resumes automatically when the decision is made.

The reservation has an expiration time. If the human doesn't approve before the reservation expires, the system automatically cancels the reservation and informs the agent. The agent can then tell the user that the booking window has closed.

For more on human oversight in AI systems, see our guide on human-in-the-loop AI.

Component 4: HMAC Receipts

Every completed order generates a tamper-evident receipt using HMAC-SHA256. This provides cryptographic proof of what happened.

Receipt Specification

Parameter	Value
Algorithm	HMAC-SHA256
Key storage	Per-tenant secret (rotated annually)
Output format	Hex-encoded string
Canonicalization	`JSON.stringify(payload, Object.keys(payload).sort())`

Receipt Payload

interface ReceiptPayload {
    order_id: string;
    tenant_id: string;
    products: Array<{
        id: string;
        date: string;
        total_price: number;
    }>;
    total_price: number;
    currency: string;
    booker_email: string;
    created_at: string;  // ISO 8601
}

Signing

function generateHmacReceipt(order: Order, tenantId: string): Receipt {
    const payload: ReceiptPayload = {
        order_id: order.id,
        tenant_id: tenantId,
        products: order.items.map(item => ({
            id: item.productId,
            date: item.date,
            total_price: item.totalPrice,
        })),
        total_price: order.totalPrice,
        currency: order.currency,
        booker_email: order.bookerEmail,
        created_at: order.createdAt.toISOString(),
    };

    // Canonical JSON: sorted keys for deterministic output
    const canonical = JSON.stringify(payload, Object.keys(payload).sort());
    const secret = await secretsManager.getTenantSecret(tenantId);
    const hmac = crypto.createHmac('sha256', secret).update(canonical).digest('hex');

    return { payload, hmac, signedAt: new Date().toISOString() };
}

Verification

GET /v1/order/{orderId}/verify
Response: { valid: true, signed_at: "2026-06-15T14:30:00Z" }

The verification endpoint recomputes the HMAC from the stored order data using the tenant's secret and compares it to the stored receipt. A mismatch means the order data was tampered with after signing. This gives merchants, auditors, and regulators verifiable evidence that a specific agent, acting within a specific tenant scope, created a specific order at a specific time.

Component 5: Immutable Audit Trail

Every ACP action generates an audit entry. The audit system uses a two-tier architecture:

Tier 1: Operational log (database, queryable, 90-day retention)

{
    action: 'order_completed',
    actor_type: 'agent',           // "agent" | "human" | "system"
    thread_id: 'thread_abc123',    // which conversation
    tenant_id: 'tenant_acme',
    channel_id: 'ch_web_en',
    order_id: 'ord_xyz789',
    total_price: 450,
    currency: 'EUR',
    supplier_id: 'supplier_alpha',
    receipt_hmac: 'a1b2c3...',
    policy_evaluated: true,
    hitl_required: false,
    created_at: '2026-06-15T14:30:00Z',
}

Tier 2: Immutable archive (object storage with write-once/read-many, 7-year retention)

The operational log streams to object storage with object locks (compliance mode). Once written, entries cannot be modified or deleted for the retention period. This satisfies regulatory requirements for financial transaction records.

Every audit entry includes actor_type and thread_id. This makes it possible to trace exactly which AI agent, in which conversation, initiated which financial action. Combined with the HMAC receipt, the audit trail provides end-to-end proof of the decision chain.

For how we build audit and observability systems more broadly, see our guides on AI governance and AI observability.

ACP vs Traditional Checkout

Capability	Traditional API Checkout	ACP Adds
Caller	Merchant backend (API key / JWT)	AI agent via MCP tool
Policy checks	Implicit (API key maps to tenant)	Explicit per-action evaluation with deny rules
Spending controls	None (merchant is the human)	`max_order_value`, `require_human_approval_above`
Supplier restrictions	Channel visibility only	Per-agent `allowed_suppliers` policy rule
Human approval	N/A (merchant is the human)	HITL gate via workflow suspend/resume
Audit trail	Standard API logs	Per-action audit with `actor_type: "agent"`, `thread_id`
Fraud prevention	Rate limiting, schema validation	Per-order item limits, supplier allow-lists, value caps
Tamper evidence	None	HMAC-SHA256 receipts with per-tenant signing keys

Idempotency

Agent-initiated transactions need idempotency at the protocol level. Network retries, agent re-attempts, and workflow replays must not create duplicate orders.

// Idempotency key for agent-initiated checkouts
const idempotencyKey = `${tenantId}:${threadId}:${productId}:${date}`;

// Check before processing
const existing = await idempotencyStore.get(idempotencyKey);
if (existing) {
    return existing.result;  // Return cached result
}

// Process and store
await idempotencyStore.acquire(idempotencyKey);
const result = await processCheckout(params);
await idempotencyStore.complete(idempotencyKey, result);
return result;

The idempotency key combines tenant, conversation thread, product, and date. The same agent in the same conversation booking the same product on the same date always returns the same result. A different conversation thread gets a fresh key.

The store uses conditional writes to prevent race conditions. If two workers try to acquire the same key simultaneously, only one succeeds.

Tenant Scoping

Every ACP operation is scoped to a tenant and channel. The identity hierarchy ensures data isolation at every layer:

Tenant (merchant organization)
  └── Channel (storefront or sales channel)
       └── Customer (end user)
            └── Session (browser/device session)
                 └── Agent Thread (single conversation)

The agent thread inherits its tenant and channel context from the authenticated session. Every tool call, every policy evaluation, every audit entry carries these identifiers. Cross-tenant data access is architecturally impossible because every query includes the tenant scope as a mandatory filter.

This is the same multi-tenant isolation pattern we describe in our system architecture guide. The difference in ACP is that the agent adds another level of scoping: the thread. An agent's memory and context are scoped to tenant_id + session_id + thread_id. Agent A in one tenant cannot see agent B's conversation in another tenant.

What Can Go Wrong

Even with ACP, there are failure modes to handle:

Failure	What Happens	ACP Response
Supplier API down	Booking can't be confirmed	Retry with backoff, inform agent of failure
Reservation expired	HITL approval took too long	Cancel reservation, agent informs user
Price changed between search and checkout	Agent quoted wrong price	Re-validate price at checkout time, reject if delta > threshold
Agent hallucinates product ID	Product doesn't exist in search index	Tool validates product existence before policy check
Concurrent bookings exhaust inventory	Two agents book last seat simultaneously	Conditional write on reservation, loser gets inventory error
HMAC secret rotation during booking	Old secret signs, new secret verifies	Keep previous secret for 24h after rotation for verification

Implementation Considerations

When to Introduce ACP

Not every AI commerce integration needs the full protocol. Consider the level of autonomy:

Level	Description	Governance Needed
Search only	Agent searches products, shows results	Policy on search (supplier visibility)
Recommendations	Agent suggests products based on preferences	Same as search
Cart building	Agent adds items to cart, human completes checkout	Minimal, human is the final gate
Assisted checkout	Agent initiates checkout, human confirms payment	HITL on every order
Autonomous checkout	Agent books and pays without human interaction	Full ACP

Start with search only. Add cart building when trust is established. Move to assisted checkout with HITL on every order. Graduate to autonomous checkout with policy-based HITL thresholds only after you have data on the agent's accuracy and the error rate is acceptable.

Technology Choices

ACP is protocol-level, not implementation-specific. The components can be built with different technologies:

Component	Our Implementation	Alternatives
MCP tools	Mastra `createTool()`	LangChain tools, custom tool server
Policy engine	DynamoDB with 5-min cache	PostgreSQL, Redis, OPA (Open Policy Agent)
HITL gates	Mastra `workflow.suspend()`	Custom queue + webhook, Temporal, Inngest
HMAC signing	Node.js `crypto.createHmac()`	Any language with HMAC-SHA256 support
Audit log	DynamoDB Streams to S3 (Object Lock)	PostgreSQL + WAL shipping, event store
Idempotency	DynamoDB conditional writes	PostgreSQL advisory locks, Redis SET NX

The protocol design matters more than the specific tools. If you implement the five components (tool governance, policy evaluation, human approval gates, tamper-evident receipts, and immutable audit), the system is safe regardless of the underlying technology.

For our broader perspective on AI systems architecture and how we approach consulting engagements, those pages provide more context.

Common Pitfalls

Letting agents call supplier APIs directly. The MCP tool layer is the trust boundary. Without it, you have no policy enforcement, no audit trail, and no idempotency.
Default-allow policy. ACP uses default-deny. If no explicit allow rule exists for an action, it's rejected. Default-allow with deny overrides is weaker because you have to anticipate every bad action in advance.
Skipping HMAC receipts. Without tamper-evident receipts, you can't prove to a merchant or regulator that an order wasn't modified after the fact.
Same idempotency key format for all operations. The key must include context (thread, product, date). A global key like order:123 doesn't prevent a different agent in a different conversation from booking the same product.
No reservation expiration. If HITL approval takes forever, the reservation holds inventory indefinitely. Always set an expiration and handle the timeout case.
Trusting agent-provided product data. The tool must validate the product exists in the search index. The agent might hallucinate a product ID, a price, or an availability date.
Not scoping agent memory per tenant. An agent serving tenant A must not have access to tenant B's conversation history, product catalog, or policy rules.
Treating agentic checkout as a feature, not a protocol. ACP is not a feature you bolt on. It's a trust framework that must be designed into the system from the start.

Key Takeaways

Agents cannot "just book things." Every agent-initiated financial transaction needs policy evaluation, optional human approval, tamper-evident receipts, and an immutable audit trail. This is not optional for production commerce.
The MCP tool is the trust boundary. The agent never has direct API access. Every action goes through a governed tool that validates, checks policy, and logs.
Default-deny policy is non-negotiable. If no explicit allow rule exists, the action is rejected. Deny rules always override allow rules.
HITL gates are configurable, not binary. Small orders go through automatically. Large orders wait for human approval. The threshold is per-tenant, per-action.
HMAC receipts prove what happened. Per-tenant signing keys, canonical JSON serialization, hex-encoded HMAC-SHA256. Verifiable by any party with the secret.
Idempotency prevents duplicate orders. The key must include conversation context (thread ID), not just the order data.
Start with search, graduate to checkout. Don't give agents autonomous buying power on day one. Build trust incrementally with data on accuracy and error rates.

Agentic commerce is inevitable. The question is whether you build the governance first or clean up the damage after. We built ACP because we needed it for a real multi-tenant commerce platform with multiple suppliers and AI-powered booking agents. The protocol works. The architecture is production-proven.

If you're building AI-powered commerce and need help designing the governance layer, talk to our team or request a quote. You can also explore our AI services and our ecommerce architecture practice for more context.

Topics covered

agentic commerceAI checkoutagent-initiated transactionsAI commerce safetyMCP commerceHITL approvalAI agent governanceagentic checkout protocol

Ready to build production AI systems?

Our team specializes in building production-ready AI systems. Let's discuss how we can help transform your enterprise with cutting-edge technology.

Start a conversation