Agentic AI

AI agents that take real actions, under your oversight

An agent is the right choice when a task needs judgment, tools, and context across several steps. When a script, a workflow, or a single prompt does the job, we will tell you that instead.

Oronts designs, builds, and runs agentic AI systems from Munich: agents that call tools and APIs, retrieve the right context, and take actions with a human in the loop where the stakes demand it. Built on open frameworks like Mastra, LangGraph, CrewAI, and the Vercel AI SDK, with OpenAI and Anthropic models, so you own the system and can run it in your own infrastructure.

AI agent human in the loop
1Receive goal
2Plan and choose a tool
3Call tool or API
4Observe and decide
loops until the goal is met or a human steps in
Tool-usingHuman oversightFramework-agnostic

The stack we build on

MastraLangGraphCrewAIVercel AI SDKAnthropic

When an agentic system is the right call

Choose an agentic AI system when a task needs judgment across several steps: reading context, deciding what to do, calling tools or APIs, and acting on the result, rather than following one fixed path. An agent fits when the work is too variable for a deterministic workflow but too repetitive and well-understood to keep doing by hand: triage and routing, retrieval over your own documents, drafting and review, multi-step research, and operations that touch several systems. We build these on open frameworks, Mastra, LangGraph, CrewAI, and the Vercel AI SDK, with OpenAI and Anthropic models behind a provider-agnostic layer, so you are never locked to one vendor. The trade is that agents are probabilistic, so they need guardrails, evaluation, and a human in the loop wherever an action is high stakes or hard to reverse. Where a deterministic script, an n8n workflow, or a single prompt covers the need, we say so plainly in the first call. The proof we point to is our own assistant running in production on this website, built on Mastra.

  • Agents that use tools and APIs, retrieve context, and take real actions, not just chat
  • Built on open frameworks: Mastra, LangGraph, CrewAI, and the Vercel AI SDK
  • Provider-agnostic across OpenAI and Anthropic, so you are not locked to one model vendor
  • Human in the loop and guardrails wherever an action is high stakes or hard to reverse

Proof before promises

Running on this site, in production

Live

The assistant on this website is an agentic system we built on Mastra, running in production. It is not a demo behind a login: it is the same kind of tool-using, context-retrieving agent we build for clients, and you can use it here before you talk to us.

See our AI work

Tool use with human oversight

Capability

Our agents call tools and APIs and take actions, but the high-stakes ones run with a human in the loop. We design where an agent acts on its own and where it must propose and wait for confirmation, so an irreversible action never happens silently.

Framework-agnostic engineering

Capability

We build on open frameworks, Mastra, LangGraph, CrewAI, and the Vercel AI SDK, and keep the model provider behind a layer, OpenAI or Anthropic. The agent logic is yours, in your repository, so a framework or model choice is never a one-way door.

EU data handling

Capability

We design the data flow so you control where context and prompts go and which model provider sees what. Where personal data is involved we work GDPR-conscious, sign a German Auftragsverarbeitungsvertrag, and keep retrieval and storage in infrastructure you operate where the case requires it.

Multi-agent orchestration

A coordinator plans the work and routes it across specialized agents, with explicit hand-offs, shared memory and bounded tools.

hand-offhand-offCoordinator agentplans + routesPlannerResearcherExecutorShared memory and stateTools and APIsGuardrails

Hand-offs are explicit and observable, shared memory carries state across agents, and a human approval gate guards consequential actions.

What we do with agentic AI

Agent design and scoping

We define what the agent should do, where it acts on its own, and where a human approves. The output is a scoped design: the task boundary, the tools, the context it needs, and the points where it must stop and ask, before any build starts.

Tool and API integration

Agents are only useful when they can act. We connect them to your systems through clean, typed tools and APIs, with idempotency and error handling, so a tool call is safe to retry and a failure is visible rather than silent.

RAG and retrieval

We build retrieval over your own documents and data so the agent answers from your knowledge, not its training. Chunking, embeddings, and a vector store wired to the agent, with citations back to the source so an answer can be checked.

Human in the loop and guardrails

We design the boundary between what an agent does autonomously and what it proposes for confirmation, plus input and output guardrails, rate and cost limits, and a clear audit trail, so the system stays inside the lines you set.

Evaluation and observability

We build evaluations that measure whether the agent does the job, and tracing and logging so you can see every step, tool call, and decision in production. Without evals an agent is a black box, so we treat them as part of the build, not an afterthought.

Production hardening

We take an agent from a working prototype to something you can run: cost controls, retries and fallbacks across model providers, prompt and output validation, monitoring, and a deployment your engineers can operate without us in the loop.

Built for your team

Agents act on your systems, so every seat at the table judges them differently.

CTOs and IT leaders

You want agents you can run and change, not a black-box platform.

Model-neutral agents on open frameworks, your code, EU-hosted, no lock-in.

Enterprise and procurement

Agents acting on systems need governance and audit.

Human-in-the-loop approval, audit logging and guardrails, AVV and TOM readiness.

Startup CTOs and founders

You need a working agent, not a research project.

A bounded agent shipped in a 90-day pilot, then scaled.

Agencies and partners

Your client needs senior agentic work under your brand.

White-label agent engineering, the same stack behind our public work.

How we deliver an agentic system

From use-case fit to a launched, owned agent: scoped, built with evaluations, reviewed with a human in the loop, hardened, and run, by senior engineers, with the code and the roadmap in your hands.

01

Scope the use case

We pin down the task, the inputs, the actions the agent may take, and the cost of getting one wrong. We confirm an agent is the right tool here, and if a workflow or a single prompt does the job, we say so before committing.

  • Task boundary
  • Action and risk map
  • Fit check
02

Design the agent and tools

We design the agent: the framework, the model provider, the tools and APIs it calls, the context and retrieval it needs, and the points where a human must approve. The design is reviewable before code, not discovered during it.

  • Framework and model
  • Tools and retrieval
  • Oversight points
03

Build with evaluations

We build the agent and its evaluations together. Evals measure whether it does the job on cases that matter, so a change that helps one path and breaks another shows up before it reaches production rather than after.

  • Agent and tools
  • Eval suite
  • Test cases
04

Human in the loop review

We wire the confirmation points: where the agent proposes and waits, what a reviewer sees, and how an action is approved, edited, or rejected. High-stakes and irreversible actions never run without a human deciding.

  • Confirm flow
  • Reviewer view
  • Audit trail
05

Harden and observe

We add the production parts: cost and rate limits, retries and provider fallbacks, input and output guardrails, and tracing so every step and tool call is visible. The agent stops being a black box and becomes a system you can operate.

  • Guardrails and limits
  • Tracing and logs
  • Provider fallbacks
06

Run in production

We deploy the agent into your infrastructure and hand off the code, the evals, and a runbook so you run it without us in the loop. An optional retainer covers maintenance, evals, model updates, and on-call to an agreed SLA target.

  • Deployment
  • Runbook and evals
  • Optional retainer

When an agent fits, and when it does not

An agentic system adds power and complexity at the same time. Four questions decide whether it is the right tool for your case.

When agents fit

Agents earn their complexity when a task needs judgment across several steps: reading context, choosing which tool to call, and acting on the result, where the path is too variable for a fixed workflow. If the task is a fixed sequence of steps, a deterministic workflow or a script is simpler, cheaper, and more predictable, and we will recommend that instead.

Oversight and safety

Agents are probabilistic, so an action that is high stakes or hard to reverse runs with a human in the loop: the agent proposes, a person confirms. We design that boundary explicitly, add input and output guardrails, and keep an audit trail, so the system is autonomous where it is safe to be and supervised where it is not.

Data and EU hosting

You control where context and prompts go and which model provider sees what. We keep retrieval and storage in infrastructure you operate where the case requires it, sign a German Auftragsverarbeitungsvertrag, and design GDPR-conscious data flows where personal data is involved. Model calls to OpenAI or Anthropic are scoped to what the task needs.

Ownership

The agent logic, the tools, the evals, and the prompts are yours, in your repository, built on open frameworks with the model provider behind a layer. There is no Oronts license gate on running or changing it, and any competent engineering team can take it over, so the framework and provider choices are never a one-way door.

Single prompt, workflow automation, or an agentic system

The same goal calls for very different tools depending on how variable the task is and how much judgment it needs. This is where a single prompt stops, where a fixed workflow stops, and where an agentic system earns its added complexity.

Scroll sideways to compare all three columns.

Single promptWorkflow automationAgentic + Oronts
Handles variable, multi-step tasks that are not a fixed path
Calls tools and APIs and takes real actions
Decides which tool to use based on context
Retrieves context from your own documents and data
Human in the loop on high-stakes actions
Evaluation and tracing built in for production
Deterministic and fully predictable output
Senior engineering team behind it

A check means it fits the case well out of the box. A minus means partial or needs work. Text cells say what it actually takes. A deterministic workflow stays the better choice where the path is fixed and full predictability matters more than judgment.

Where an agent earns its place

Concrete situations where judgment across several steps beats a fixed script, and the outcome the agent plus our engineering delivers.

Commerce

A buyer describes what they need in their own words and the catalog, stock, and pricing live across several systems

An agentic checkout assistant that retrieves the right products, checks availability and price through typed tools, and proposes the order for a human to confirm before it commits.

Operations

Routine back-office work spans email, ERP, and internal tools, with enough variation that a fixed workflow keeps breaking

An operations agent that reads the case, decides which tool to call, and acts within set limits, escalating to a person whenever a step is high stakes or hard to reverse.

Support

Inbound requests arrive mixed in language and intent, and answers depend on your own documents rather than generic knowledge

A support triage agent that classifies and routes each request, drafts a grounded reply with citations back to your sources, and hands the sensitive cases to an agent with full context.

From a single prompt to a governed agent

Most teams do not need an agent on day one. We move you up the ladder only as far as the task requires, and we say plainly when a cheaper step already covers the need. Each stage is a working system you own, not a throwaway.

    1

    Single prompt

    We start with one well-crafted model call where a single prompt with good instructions does the job. It is the cheapest, most predictable option, and for many tasks it is the right place to stop. We tell you when it is enough.

    2

    Deterministic workflow

    When the task is a fixed sequence of steps, we wire a deterministic workflow or an n8n flow that calls tools in a known order. Fully predictable, easy to reason about, and no model judgment where none is needed.

    3

    Retrieval and grounding

    When answers depend on your own documents, we add retrieval over your data with citations back to the source, so output is grounded in your knowledge and can be checked rather than trusted blindly.

    4

    Governed agent

    When the work truly needs judgment across several steps, we promote it to an agent that decides which tool to call, with guardrails, cost and rate limits, evaluations, and a human in the loop on every high-stakes action.

    5

    Production and ownership

    We harden the system with tracing, provider fallbacks, and a runbook, then hand off the code and the evals so you run it in your own infrastructure. An optional retainer covers maintenance and model updates to an agreed SLA target.

Who owns what across an agent build

How responsibility splits between Oronts, your team, and the model and cloud providers on an agentic engagement.

Responsibility ownership across the delivery chain
ResponsibilityOrontsYour agency / partnerYouModel / cloud provider
Agent design and orchestrationOronts owns Agent design and orchestration
Evaluations and test harnessOronts owns Evaluations and test harness
Guardrails and approval gatesOronts owns Guardrails and approval gatesYou owns Guardrails and approval gates
Prompts, policies and business rulesYou owns Prompts, policies and business rules
Hosting, data and model keysYou owns Hosting, data and model keysModel / cloud provider owns Hosting, data and model keys
On-call and incident responseOronts owns On-call and incident response
Code and IPYou owns Code and IP

When an agent is not the right call

  • A deterministic workflow where the steps are fixed and known: a script or an n8n flow is simpler, cheaper, and fully predictable, and an agent only adds cost and unpredictability where none is needed.
  • Simple CRUD or a single, well-defined transformation: if the task is read, write, or one clear mapping, plain code does it better than a model, and an agent is the wrong tool.
  • A high-stakes action with no human available to supervise it: if no one can review and confirm an irreversible step, an autonomous agent is a risk we will not recommend until the oversight exists.
  • A case a single well-crafted prompt already handles: if one model call with good instructions does the job, the orchestration, tools, and evals of an agent are overhead you do not need.

Who you're working with

HRB 288224
Registered in Munich
15+
Years, founder-led
DE · EN · AR
Delivery languages
2
Open source on GitHub
EU
Data residency, Frankfurt
AVV/DPA
Ready to sign, Art. 28

Engagement levels

Oronts works with serious teams that need senior delivery, not low-cost outsourcing.

Production Pilot
from 25k EUR
Custom software and AI projects
from 50k EUR
Ongoing technical retainers
from 15k EUR/month

Exact pricing depends on scope, responsibility, delivery speed, team size, integrations, support expectations and production risk.

Agentic AI questions, answered directly

Build cost depends on the use case: how many tools it touches, how much retrieval it needs, and how much human-in-the-loop review the actions demand. Running cost is mainly model usage from OpenAI or Anthropic plus your hosting, which we keep visible with cost limits and tracing. We scope a fixed price to your case after the first call, and we tell you when a cheaper workflow or single prompt would do the job instead.
We design the boundary between what the agent does on its own and what it must propose for a human to confirm. High-stakes and irreversible actions never run without a person deciding. On top of that, guardrails on input and output, rate and cost limits, and an audit trail of every step and tool call keep the system inside the lines you set.
Two layers. Retrieval grounds the agent in your own documents with citations back to the source, so an answer can be checked rather than trusted blindly. Evaluations measure whether the agent does the job on cases that matter, and guardrails plus a human in the loop catch the high-stakes actions before they run. We do not claim an agent never errs. We design so the errors that matter are caught.
You control where context and prompts go and which model provider sees what. We keep retrieval and storage in infrastructure you operate where the case requires it, sign a German Auftragsverarbeitungsvertrag, and design GDPR-conscious data flows where personal data is involved. Model calls to OpenAI or Anthropic are scoped to what the task needs, and nothing routes anywhere you have not approved.
When the task is a fixed sequence of steps, a deterministic workflow or a script is simpler and more predictable. When it is simple CRUD or one clear transformation, plain code does it better. When a single well-crafted prompt already handles it, an agent is overhead. And when a high-stakes action has no human to supervise it, we will not recommend an autonomous agent until the oversight exists. We say all of this in the first call.
Everything: the agent logic, the tools, the evals, and the prompts ship as code in your repository, built on open frameworks like Mastra, LangGraph, CrewAI, or the Vercel AI SDK, with the model provider behind a layer. There is no Oronts license gate, and any competent engineering team can take it over. Leaving us is never a rebuild.

Talk to the engineers, not a sales team

Founder-led, senior-only delivery from Munich. Scope your agentic AI use case in one conversation.