Technical Guide

How This Site's AI Works: A Mastra Production Teardown

A concrete look at the AI assistant running on this website. Mastra agents, tool calling for lead capture, locale-aware runtime context, the security layer around every request, and why the model is the small part.

June 17, 20269 min readOronts Engineering Team

The assistant you are talking to is the demo

Most agencies show you a chatbot screenshot. The AI assistant on this site is the real system, in production, built on Mastra. This guide opens it up. No diagram theater, just the architecture, the tradeoffs, and the parts that actually matter when an AI feature has to survive real traffic.

The headline most teams miss: the language model is maybe ten percent of a production AI assistant. The other ninety percent is routing, tools, state, locale, security, and the boring delivery guarantees that decide whether a lead ever reaches a human. We covered the general pattern in From AI Prototype to Production. This is the specific build.

The shape of the system

The assistant is a set of Mastra agents, each with a narrow job, exposed through Next.js API routes:

  • quoteAgent: the conversational assistant on the quote and contact surfaces. It answers questions about services, qualifies naturally, and captures leads.
  • chatbotAgent: a lightweight assistant for the interactive demo.
  • visionAgent, voiceAgent, analyticsAgent: focused agents behind the demo playground.

Each agent is a plain declaration: a name, an instruction set, a model, and the tools it is allowed to call. The model id is centralized in one module, so changing it across every agent is a one line edit and a deployment, not a search and replace across the codebase.

export const quoteAgent = new Agent({
  name: "quoteAgent",
  instructions: `...`,
  model: MODEL,
  tools: { submitProposal, scheduleCall, draftEmail, sendSummary },
});

That structure is the point. The agent does not contain business logic. It decides which tool to call and when. The tools contain the logic.

Tool calling is where the work happens

A chat that only returns text is a toy. The quote agent has four tools, and the instructions are written so it calls them the moment it has the required fields, without asking the user for permission to proceed:

  • submitProposal: capture a project lead (name, email, summary).
  • scheduleCall: book a call with the team.
  • sendSummary: email a conversation summary.
  • draftEmail: prepare an email for the team to review.

The discipline that makes this usable is in the prompt: collect the minimum required fields, then act. No "shall I proceed?" loops, no empty acknowledgements. When the user has given a name, an email, and a project summary, the proposal tool fires. The user gets a clear result, not a dead end.

Locale lives in runtime context, not in the prompt

This site ships in five languages: English, German, French, Spanish, and Arabic. The assistant has to answer in the visitor's language and respect register, formal Sie in German for example, and right to left for Arabic.

The wrong way to do this is to bake the locale into the system prompt and rebuild a prompt per language. The right way is runtime context. Each request carries the locale, resolved from an explicit value, an x-i18n-locale header, or the Accept-Language header, and the agent reads it at generation time:

const runtimeContext = new RuntimeContext();
if (finalLocale) runtimeContext.set("locale", finalLocale);
runtimeContext.set("requestId", requestId);

One agent, one instruction set, five languages. The same request id flows through logs, so a single conversation can be traced end to end.

State that degrades instead of breaking

Conversation storage uses LibSQL when a database URL is configured, and falls back to an in memory store when it is not. The import is lazy and wrapped in a try and catch, so a missing optional dependency never takes the assistant down. It just runs without durable history.

This is a recurring principle across the system: optional infrastructure degrades to a safe default rather than throwing. Aggregation that cannot reach a source returns empty. The assistant that cannot reach a database keeps answering. Nothing user facing depends on an optional service being present.

The security layer around every request

The model is the easy part. The request boundary is where production AI features get hurt. Every assistant call passes through the same gates before a single token is generated:

  1. Method and input validation: POST only, input length bounded, malformed bodies rejected with a structured error and a request id.
  2. Rate limiting: per request limits, so a single client cannot drain the inference budget.
  3. Origin and CSRF verification: browser calls must come from an allowed origin and carry a valid CSRF token. The token is set as a cookie and checked on every mutating call.
  4. Internal authorization: server to server calls from our own backend skip the browser checks through an explicit internal auth path, never an implicit bypass.

These are not AI specific. They are the same gates any serious endpoint needs. The mistake teams make is treating an AI route as special and skipping them. An LLM endpoint is still an endpoint, and it spends money on every call, which makes the rate limit a cost control as much as a security control.

No silent lead loss

A lead that the AI captures but the system drops is worse than no AI at all. So leads are journaled to the log stream before delivery is attempted, and delivery failures are logged explicitly rather than swallowed. If an email provider is down, the lead still exists in the record and can be recovered. The capture path and the delivery path are decoupled on purpose, because the expensive thing to lose is the intent, not the email.

Streaming, because latency is a feature

Responses stream token by token rather than blocking until the full answer is ready. On a sales surface, perceived latency is the difference between a visitor who waits and one who leaves. Streaming is handled in a small dedicated helper so every agent route gets the same behavior without repeating the plumbing.

What we deliberately did not build

Good architecture is also a list of things you said no to.

  • No general purpose assistant. The quote agent is scoped to Oronts business only. Ask it to write FizzBuzz or count letters in a word and it declines and redirects. A sales assistant that answers trivia is a liability, not a feature, and an open ended assistant is an open ended attack surface.
  • No invented numbers. The assistant states published pricing bands verbatim and never constructs a project specific estimate. Planning ranges come from a senior engineer after scoping, never from a model that wants to be helpful.
  • No promises. It never confirms feasibility, commits a timeline, or guarantees an outcome. It collects information and hands off to people.

Those constraints live in the instructions and they are the most important lines in the whole system. The capability of a model is bounded by the guardrails around it, and on a commercial site the guardrails are the product.

The takeaway for your own build

If you are putting an AI feature into production, the model choice is the decision you will spend the least time regretting. Spend your effort on the boundary: validation, rate limiting, locale, graceful degradation, and a capture path that never loses intent. Frameworks like Mastra give you the agent and tool primitives so you can spend that effort where it pays off.

For the surrounding patterns, see our guides on multi-agent architecture, AI observability, and human in the loop AI. If you want this kind of system built into your own product, start a conversation or request a quote.

Topics covered

MastraMastra AIproduction AI assistantAI agent architecturetool callingNext.js AIAI SDKagentic AILLM in productionAI lead capture

Ready to build production AI systems?

Our team specializes in building production-ready AI systems. Let's discuss how we can help transform your enterprise with cutting-edge technology.

Start a conversation