Technical Guide

Designing AI Workflows That Actually Work in Production

A hands-on guide to building robust AI pipelines. Learn pipeline architecture, step sequencing, branching logic, error handling, and prompt chaining from engineers who've shipped these systems.

July 10, 202518 min readOronts Engineering Team

Why Most AI Workflows Fail in Production

Here's something we've learned the hard way: getting an AI to do something impressive in a demo is easy. Getting it to do that same thing reliably, thousands of times a day, with real data and real edge cases? That's where most teams hit a wall.

We've built AI workflows for document processing, customer support automation, content generation, and data analysis. Along the way, we've made every mistake you can imagine. This guide is what we wish someone had told us before we started.

The difference between a prototype and a production system isn't the model you use. It's everything around the model.

Let me walk you through how we actually design AI workflows now, after learning these lessons.

The Anatomy of a Production AI Pipeline

Think of an AI workflow as a series of processing stages, each with a specific job. Here's the basic structure we use:

Input → Validation → Preprocessing → AI Processing → Post-processing → Validation → Output
         ↓              ↓               ↓                ↓              ↓
      [Error]       [Transform]     [LLM Call]       [Parse]       [Quality]
      Handler       + Enrich         + Tools         + Format       Check

Each stage has clear inputs, outputs, and failure modes. Let's break them down.

Stage 1: Input Validation and Normalization

Never trust incoming data. Ever. We learned this when a customer sent us a "text document" that was actually a 50MB binary file. The pipeline choked, the queue backed up, and we spent a weekend fixing it.

const validateInput = async (input) => {
  const checks = {
    exists: input !== null && input !== undefined,
    sizeOk: Buffer.byteLength(input, 'utf8') < MAX_INPUT_SIZE,
    formatOk: isValidFormat(input),
    contentOk: !containsMaliciousPatterns(input)
  };

  const failures = Object.entries(checks)
    .filter(([_, passed]) => !passed)
    .map(([check]) => check);

  if (failures.length > 0) {
    throw new ValidationError(`Input failed: ${failures.join(', ')}`);
  }

  return normalizeInput(input);
};

What to check:

File size limits (set them lower than you think you need)
Format validation (is it actually JSON, not just named .json?)
Character encoding (UTF-8 issues will haunt you)
Content safety (don't send malicious content to your LLM)

Stage 2: Preprocessing and Enrichment

Raw input rarely goes straight to an LLM. You usually need to transform it, add context, or break it into chunks.

Preprocessing Task	When to Use	Example
Chunking	Long documents exceeding context limits	Split a 100-page PDF into 2000-token chunks
Enrichment	Need additional context	Add customer history before processing support ticket
Extraction	Only parts of input are relevant	Pull just the "description" field from a JSON payload
Transformation	Format conversion needed	Convert HTML to markdown for cleaner processing
Deduplication	Repeated content wastes tokens	Remove duplicate paragraphs from scraped content

Here's a chunking strategy we actually use:

const chunkDocument = (text, options = {}) => {
  const {
    maxTokens = 2000,
    overlap = 200,
    preserveParagraphs = true
  } = options;

  const chunks = [];
  let currentChunk = '';

  const paragraphs = text.split(/\n\n+/);

  for (const para of paragraphs) {
    const combined = currentChunk + '\n\n' + para;

    if (estimateTokens(combined) > maxTokens) {
      if (currentChunk) {
        chunks.push(currentChunk.trim());
        // Keep overlap for context continuity
        currentChunk = getLastNTokens(currentChunk, overlap) + '\n\n' + para;
      } else {
        // Single paragraph too long, force split
        chunks.push(...forceSplitParagraph(para, maxTokens));
        currentChunk = '';
      }
    } else {
      currentChunk = combined;
    }
  }

  if (currentChunk.trim()) {
    chunks.push(currentChunk.trim());
  }

  return chunks;
};

The overlap is crucial. Without it, information that spans chunk boundaries gets lost. We learned this when our document summarizer kept missing key points that happened to fall between chunks.

Designing the AI Processing Core

This is where the actual LLM calls happen. But a single call is rarely enough for complex tasks.

The Prompt Chain Pattern

Instead of one giant prompt trying to do everything, break complex tasks into focused steps:

[Understand] → [Plan] → [Execute] → [Verify] → [Format]

Example: Processing a customer complaint

const processComplaint = async (complaint) => {
  // Step 1: Understand - Extract key information
  const analysis = await llm.call({
    system: `Extract structured information from customer complaints.
             Return JSON with: issue_type, urgency, customer_emotion, key_details`,
    user: complaint
  });

  // Step 2: Plan - Determine response strategy
  const strategy = await llm.call({
    system: `Based on this complaint analysis, determine the response strategy.
             Consider: resolution options, escalation needs, compensation eligibility`,
    user: JSON.stringify(analysis)
  });

  // Step 3: Execute - Generate response
  const response = await llm.call({
    system: `Write a customer response following this strategy.
             Tone: empathetic, professional. Length: 2-3 paragraphs.`,
    user: `Strategy: ${JSON.stringify(strategy)}\nOriginal complaint: ${complaint}`
  });

  // Step 4: Verify - Quality check
  const verification = await llm.call({
    system: `Review this customer response. Check for:
             - Addresses all concerns? - Appropriate tone? - Actionable next steps?
             Return: {approved: boolean, issues: string[]}`,
    user: response
  });

  if (!verification.approved) {
    return await regenerateWithFeedback(response, verification.issues);
  }

  return response;
};

Each step is simpler, more testable, and easier to debug. When something goes wrong, you know exactly which stage failed.

When to Chain vs. When to Parallelize

Not everything needs to be sequential. Here's how we decide:

Pattern	Use When	Example
Sequential Chain	Each step depends on previous output	Understanding → Planning → Execution
Parallel Execution	Steps are independent	Analyzing multiple documents simultaneously
Map-Reduce	Need to process items then aggregate	Summarize each chunk, then combine summaries
Conditional Branching	Different paths for different inputs	Simple queries vs. complex analysis

Parallel execution example:

const analyzeDocuments = async (documents) => {
  // Process all documents in parallel
  const analyses = await Promise.all(
    documents.map(doc => analyzeDocument(doc))
  );

  // Reduce: combine into final report
  const combinedReport = await llm.call({
    system: 'Synthesize these individual analyses into a cohesive report',
    user: analyses.map((a, i) => `Document ${i + 1}:\n${a}`).join('\n\n---\n\n')
  });

  return combinedReport;
};

Branching Logic: Routing to the Right Handler

Real-world inputs vary wildly. A "customer message" might be a complaint, a question, a compliment, or spam. Each needs different handling.

Classification-Based Routing

const routeCustomerMessage = async (message) => {
  const classification = await llm.call({
    system: `Classify this customer message into exactly one category:
             - complaint: Customer expressing dissatisfaction
             - question: Customer seeking information
             - feedback: General feedback or suggestions
             - urgent: Safety issues, legal threats, executive escalation
             - spam: Irrelevant or automated messages

             Return only the category name.`,
    user: message
  });

  const handlers = {
    complaint: handleComplaint,
    question: handleQuestion,
    feedback: handleFeedback,
    urgent: escalateToHuman,
    spam: markAsSpam
  };

  const handler = handlers[classification.toLowerCase()] || handleUnknown;
  return await handler(message);
};

Confidence-Based Routing

Sometimes the AI isn't sure. Build that uncertainty into your routing:

const routeWithConfidence = async (input) => {
  const result = await llm.call({
    system: `Analyze and route this request. Return JSON:
             {
               "category": "string",
               "confidence": 0.0-1.0,
               "reasoning": "why this category"
             }`,
    user: input
  });

  if (result.confidence < 0.7) {
    // Low confidence - get human input or use fallback
    return await handleLowConfidence(input, result);
  }

  if (result.confidence < 0.9 && isHighStakes(result.category)) {
    // Medium confidence on important decisions - verify
    return await verifyThenRoute(input, result);
  }

  // High confidence - proceed automatically
  return await routeToHandler(result.category, input);
};

Error Handling: Because Things Will Break

AI workflows fail in ways traditional software doesn't. The LLM might return invalid JSON, hallucinate information, or just... not follow instructions. Plan for it.

The Error Hierarchy

class AIWorkflowError extends Error {
  constructor(message, stage, recoverable = true) {
    super(message);
    this.stage = stage;
    this.recoverable = recoverable;
  }
}

class ValidationError extends AIWorkflowError {
  constructor(message) {
    super(message, 'validation', true);
  }
}

class LLMError extends AIWorkflowError {
  constructor(message, type) {
    super(message, 'llm_call', type !== 'rate_limit');
    this.type = type; // 'timeout', 'rate_limit', 'invalid_response', 'refused'
  }
}

class OutputError extends AIWorkflowError {
  constructor(message) {
    super(message, 'output', true);
  }
}

Retry Strategies That Work

Different failures need different retry approaches:

Error Type	Retry Strategy	Max Retries	Backoff
Rate limit	Wait and retry	5	Exponential with jitter
Timeout	Retry immediately	3	Linear
Invalid response	Retry with feedback	2	None
Model refused	Rephrase and retry	2	None
Server error	Wait and retry	3	Exponential

const withRetry = async (operation, options = {}) => {
  const {
    maxRetries = 3,
    backoffMs = 1000,
    backoffMultiplier = 2,
    shouldRetry = () => true
  } = options;

  let lastError;

  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      return await operation();
    } catch (error) {
      lastError = error;

      if (!shouldRetry(error) || attempt === maxRetries) {
        throw error;
      }

      const delay = backoffMs * Math.pow(backoffMultiplier, attempt);
      const jitter = Math.random() * delay * 0.1;
      await sleep(delay + jitter);
    }
  }

  throw lastError;
};

// Usage with LLM calls
const safeLLMCall = async (params) => {
  return await withRetry(
    () => llm.call(params),
    {
      maxRetries: 3,
      shouldRetry: (error) => {
        if (error.type === 'rate_limit') return true;
        if (error.type === 'timeout') return true;
        if (error.type === 'server_error') return true;
        return false;
      }
    }
  );
};

Self-Healing for Invalid Responses

When the LLM returns garbage, sometimes you can fix it:

const parseWithRecovery = async (llmResponse, expectedSchema) => {
  // First attempt: direct parse
  try {
    const parsed = JSON.parse(llmResponse);
    if (validateSchema(parsed, expectedSchema)) {
      return parsed;
    }
  } catch (e) {
    // JSON parse failed, continue to recovery
  }

  // Recovery attempt: ask LLM to fix its output
  const fixed = await llm.call({
    system: `The following response should be valid JSON matching this schema:
             ${JSON.stringify(expectedSchema)}

             Fix the response to be valid JSON. Return ONLY the fixed JSON.`,
    user: llmResponse
  });

  try {
    const parsed = JSON.parse(fixed);
    if (validateSchema(parsed, expectedSchema)) {
      return parsed;
    }
  } catch (e) {
    // Still broken
  }

  // Final fallback: extract what we can
  return extractPartialData(llmResponse, expectedSchema);
};

Monitoring and Observability

You can't fix what you can't see. Here's what we track:

Key Metrics

Metric	Why It Matters	Alert Threshold
Latency (p50, p95, p99)	User experience, timeout risk	p95 > 10s
Success rate	Overall health	< 95%
Token usage	Cost control	> 150% of baseline
Retry rate	Hidden instability	> 10%
Classification distribution	Detect drift	Significant shift from baseline
Output quality scores	Catch degradation	Average < 0.8

Structured Logging

Every workflow run should produce traceable logs:

const runWorkflow = async (input, context) => {
  const runId = generateRunId();
  const startTime = Date.now();

  const log = (stage, data) => {
    logger.info({
      runId,
      stage,
      timestamp: Date.now(),
      elapsed: Date.now() - startTime,
      ...data
    });
  };

  try {
    log('start', { inputSize: input.length });

    const validated = await validate(input);
    log('validated', { valid: true });

    const processed = await process(validated);
    log('processed', {
      tokensUsed: processed.usage.total,
      model: processed.model
    });

    const output = await format(processed);
    log('complete', {
      success: true,
      outputSize: output.length,
      totalTime: Date.now() - startTime
    });

    return output;

  } catch (error) {
    log('error', {
      error: error.message,
      stage: error.stage,
      recoverable: error.recoverable
    });
    throw error;
  }
};

Putting It All Together: A Complete Example

Let's build a document analysis workflow that uses everything we've discussed:

class DocumentAnalysisPipeline {
  constructor(options = {}) {
    this.maxChunkSize = options.maxChunkSize || 3000;
    this.concurrency = options.concurrency || 5;
  }

  async run(document) {
    // Stage 1: Validate
    const validated = await this.validate(document);

    // Stage 2: Preprocess
    const chunks = await this.preprocess(validated);

    // Stage 3: Parallel analysis with concurrency control
    const analyses = await this.analyzeChunks(chunks);

    // Stage 4: Synthesize
    const synthesis = await this.synthesize(analyses);

    // Stage 5: Quality check
    const final = await this.qualityCheck(synthesis, document);

    return final;
  }

  async validate(document) {
    if (!document || typeof document !== 'string') {
      throw new ValidationError('Document must be a non-empty string');
    }

    if (document.length > 1000000) {
      throw new ValidationError('Document exceeds maximum size');
    }

    return document;
  }

  async preprocess(document) {
    return chunkDocument(document, {
      maxTokens: this.maxChunkSize,
      overlap: 200
    });
  }

  async analyzeChunks(chunks) {
    const results = [];

    // Process in batches to control concurrency
    for (let i = 0; i < chunks.length; i += this.concurrency) {
      const batch = chunks.slice(i, i + this.concurrency);
      const batchResults = await Promise.all(
        batch.map((chunk, idx) => this.analyzeChunk(chunk, i + idx))
      );
      results.push(...batchResults);
    }

    return results;
  }

  async analyzeChunk(chunk, index) {
    return await withRetry(async () => {
      const result = await llm.call({
        system: `Analyze this document section. Extract:
                 - Key topics and themes
                 - Important facts and figures
                 - Notable quotes or statements
                 - Questions or gaps in information

                 Return structured JSON.`,
        user: `Section ${index + 1}:\n\n${chunk}`
      });

      return parseWithRecovery(result, ANALYSIS_SCHEMA);
    });
  }

  async synthesize(analyses) {
    const combined = analyses.map((a, i) =>
      `Section ${i + 1}:\n${JSON.stringify(a, null, 2)}`
    ).join('\n\n---\n\n');

    return await llm.call({
      system: `Synthesize these section analyses into a comprehensive document summary.
               Structure:
               1. Executive Summary (2-3 sentences)
               2. Key Findings (bullet points)
               3. Important Details
               4. Gaps or Questions
               5. Recommendations`,
      user: combined
    });
  }

  async qualityCheck(synthesis, originalDocument) {
    const check = await llm.call({
      system: `Review this analysis for quality. Check:
               - Accuracy: Does it reflect the original document?
               - Completeness: Are major points covered?
               - Clarity: Is it well-organized and clear?

               Return: {score: 0-1, issues: string[], approved: boolean}`,
      user: `Analysis:\n${synthesis}\n\nOriginal (first 2000 chars):\n${originalDocument.slice(0, 2000)}`
    });

    if (!check.approved) {
      // Log for review but don't fail
      logger.warn({ issues: check.issues, score: check.score });
    }

    return {
      analysis: synthesis,
      qualityScore: check.score,
      qualityIssues: check.issues
    };
  }
}

Common Pitfalls and How to Avoid Them

After building dozens of these systems, here are the mistakes we see most often:

1. No Timeout Boundaries

Every LLM call needs a timeout. Set them aggressively.

// Bad: No timeout
const result = await llm.call(params);

// Good: Explicit timeout
const result = await Promise.race([
  llm.call(params),
  new Promise((_, reject) =>
    setTimeout(() => reject(new Error('Timeout')), 30000)
  )
]);

2. Ignoring Token Limits

Track usage and set budgets:

const tokenBudget = {
  max: 10000,
  used: 0,

  canSpend(amount) {
    return this.used + amount <= this.max;
  },

  spend(amount) {
    this.used += amount;
    if (this.used > this.max * 0.8) {
      logger.warn('Token budget 80% consumed');
    }
  }
};

3. No Fallback for Critical Paths

Always have a plan B:

const processWithFallback = async (input) => {
  try {
    return await primaryProcess(input);
  } catch (error) {
    if (error.recoverable) {
      return await simplifiedProcess(input);
    }
    // Critical path - queue for manual processing
    await queueForManualReview(input, error);
    return { status: 'queued_for_review' };
  }
};

What's Next

AI workflows are getting more sophisticated. Here's where we see things heading:

Better tool use: Models are getting better at deciding when and how to use external tools
Longer context: Bigger context windows mean fewer chunking headaches
Faster inference: Latency is dropping, enabling more complex real-time workflows
Specialized models: Fine-tuned models for specific tasks outperform general-purpose ones

But the fundamentals don't change. Validate inputs, handle errors gracefully, monitor everything, and always have a fallback. Build on those principles, and your AI workflows will survive contact with the real world.

If you're building AI workflows and hitting walls, we've probably seen your problem before. Reach out, and let's figure it out together.

Topics covered

AI workflow designAI pipeline architectureprompt chainingAI error handlingLLM orchestrationAI automationworkflow branchingAI systems engineering

Ready to implement agentic AI?

Our team specializes in building production-ready AI systems. Let's discuss how we can help you leverage agentic AI for your enterprise.

Start a conversation