Technical Guide

OpenTelemetry in Production: Traces, Context, and What Actually Matters

Production OpenTelemetry patterns. Context propagation across queues and workers, tracing LLM calls, sampling strategies for AI workloads, privacy-safe spans, and the baggage API.

January 19, 202616 min readOronts Engineering Team

Why OpenTelemetry Won

Three years ago, the observability landscape was fragmented. Jaeger for tracing, Prometheus for metrics, Fluentd for logs, each with its own SDK, its own protocol, its own vendor lock-in. OpenTelemetry unified them into a single standard: one SDK, one protocol (OTLP), one collector that routes to any backend.

We adopted OpenTelemetry across our production systems. This article covers the patterns that actually matter in production, not the setup tutorial. For the broader observability strategy (when to alert, what to log, how to structure metrics), see our AI observability guide. This article goes deeper on OpenTelemetry-specific implementation.

Context Propagation: The Hardest Part

A single user request might cross 5 services, 3 message queues, 2 worker processes, and an LLM call. Context propagation ensures that the trace follows the request across all of them.

HTTP Propagation (Easy)

OpenTelemetry auto-instruments HTTP clients and servers. The trace context propagates via traceparent and tracestate headers. This works out of the box.

// Auto-instrumented: no code needed for HTTP propagation
// The SDK adds traceparent header to outgoing requests
// The receiving service extracts it and continues the trace

Queue Propagation (Hard)

Message queues break automatic propagation. When you enqueue a message, the trace context must be serialized into the message headers. When a worker dequeues it, the context must be extracted and the trace continued.

// Producer: inject trace context into message headers
import { context, propagation } from '@opentelemetry/api';

async function enqueueMessage(queue: string, payload: any) {
    const carrier: Record<string, string> = {};
    propagation.inject(context.active(), carrier);

    await messageQueue.send(queue, {
        body: payload,
        headers: carrier, // Contains traceparent, tracestate
    });
}

// Consumer: extract trace context from message headers
async function processMessage(message: QueueMessage) {
    const parentContext = propagation.extract(context.active(), message.headers);

    await context.with(parentContext, async () => {
        const span = tracer.startSpan('process_message', {
            attributes: {
                'messaging.system': 'rabbitmq',
                'messaging.operation': 'process',
                'messaging.destination': message.queue,
            },
        });

        try {
            await handleMessage(message.body);
            span.setStatus({ code: SpanStatusCode.OK });
        } catch (error) {
            span.setStatus({ code: SpanStatusCode.ERROR, message: error.message });
            throw error;
        } finally {
            span.end();
        }
    });
}

This pattern works for RabbitMQ, Kafka, BullMQ, SQS, and Symfony Messenger. The message headers carry the trace context. The consumer extracts it and creates child spans under the original trace.

Worker Process Propagation

For Vendure's BullMQ workers, Pimcore's Symfony Messenger workers, and similar background job systems, the pattern is the same: serialize context into the job payload, extract on the worker side.

// BullMQ: add trace context to job data
async function addJob(queue: Queue, data: any) {
    const carrier: Record<string, string> = {};
    propagation.inject(context.active(), carrier);

    await queue.add('process', {
        ...data,
        _traceContext: carrier,
    });
}

// BullMQ: extract trace context in worker
worker.on('process', async (job) => {
    const parentContext = propagation.extract(context.active(), job.data._traceContext || {});

    await context.with(parentContext, async () => {
        const span = tracer.startSpan(`job:${job.name}`);
        try {
            await processJob(job.data);
        } finally {
            span.end();
        }
    });
});

Tracing LLM Calls

LLM calls are the most expensive operations in AI systems. Tracing them with proper attributes enables cost tracking, latency analysis, and quality monitoring.

async function tracedLlmCall(prompt: string, options: LlmOptions): Promise<string> {
    const span = tracer.startSpan('llm.generate', {
        attributes: {
            'llm.provider': options.provider,           // "openai", "anthropic"
            'llm.model': options.model,                 // "gpt-4o", "claude-sonnet-4-20250514"
            'llm.temperature': options.temperature,
            'llm.max_tokens': options.maxTokens,
            'llm.prompt_tokens': estimateTokens(prompt), // estimate before call
        },
    });

    try {
        const response = await llmClient.generate(prompt, options);

        span.setAttributes({
            'llm.response_tokens': response.usage.completionTokens,
            'llm.total_tokens': response.usage.totalTokens,
            'llm.finish_reason': response.finishReason,
            'llm.cost_usd': calculateCost(response.usage, options.model),
        });

        span.setStatus({ code: SpanStatusCode.OK });
        return response.text;
    } catch (error) {
        span.setStatus({ code: SpanStatusCode.ERROR, message: error.message });
        span.setAttribute('llm.error_type', error.constructor.name);
        throw error;
    } finally {
        span.end();
    }
}

Do NOT put prompt text in span attributes. Prompts contain PII. Span attributes are sent to your observability backend (Jaeger, Grafana Tempo, Datadog). Instead, log the prompt hash or token count. For the full PII-safe logging architecture, see our AI data leakage guide.

LLM Span Attributes Convention

Attribute	Type	Example
`llm.provider`	string	"openai"
`llm.model`	string	"gpt-4o"
`llm.temperature`	float	0.7
`llm.prompt_tokens`	int	1250
`llm.response_tokens`	int	340
`llm.total_tokens`	int	1590
`llm.finish_reason`	string	"stop"
`llm.cost_usd`	float	0.023
`llm.error_type`	string	"RateLimitError"
`llm.cache_hit`	boolean	false

Sampling Strategies

In high-volume AI systems, tracing every request is too expensive. Sampling reduces volume while preserving visibility into important traces.

Head-Based Sampling

Decide at the start of the trace whether to sample it. Simple but lossy.

// Sample 10% of all traces
const sampler = new TraceIdRatioBased(0.1);

// Always sample errors (override ratio for error traces)
const compositeSampler = new ParentBasedSampler({
    root: new TraceIdRatioBased(0.1),
    // Errors always sampled via span processor
});

Tail-Based Sampling (Recommended for AI)

Decide after the trace completes whether to keep it. Keeps all interesting traces (errors, slow responses, high cost) and drops routine ones.

// OpenTelemetry Collector: tail-based sampling config
processors:
    tail_sampling:
        decision_wait: 10s
        policies:
            # Keep all errors
            - name: errors
              type: status_code
              status_code: { status_codes: [ERROR] }

            # Keep slow traces (> 5 seconds)
            - name: slow
              type: latency
              latency: { threshold_ms: 5000 }

            # Keep expensive LLM calls (> $0.10)
            - name: expensive_llm
              type: string_attribute
              string_attribute:
                  key: llm.cost_usd
                  values: []  # Custom: filter in pipeline
                  enabled_regex_matching: true

            # Sample 5% of everything else
            - name: baseline
              type: probabilistic
              probabilistic: { sampling_percentage: 5 }

Tail-based sampling requires the OpenTelemetry Collector. The collector buffers complete traces, evaluates policies, and forwards only sampled traces to the backend. This adds latency (the decision_wait period) but dramatically reduces storage costs while keeping all interesting data.

Privacy-Safe Spans

Span attributes, span names, and span events are all sent to your observability backend. If any of these contain PII, your tracing infrastructure becomes a data protection liability.

// BAD: PII in span attributes
span.setAttribute('user.email', 'sara.mustermann@beispiel.de');
span.setAttribute('user.name', 'Sara Mustermann');
span.setAttribute('request.body', JSON.stringify(requestBody)); // Contains PII

// GOOD: token IDs and aggregate data only
span.setAttribute('user.id', 'usr_abc123');  // Opaque ID, not PII
span.setAttribute('entities.detected', 3);
span.setAttribute('entities.types', ['person', 'email', 'phone']);
span.setAttribute('policy.applied', 'german-support');

Rules for privacy-safe tracing:

User IDs: opaque identifiers only (not emails, not names)
Request bodies: never include raw content. Log entity counts and types.
LLM prompts: never include. Log token counts and prompt hash.
Error messages: sanitize before attaching to spans. Strip any user data.

The Baggage API

OpenTelemetry Baggage carries key-value pairs across service boundaries. Unlike span attributes (which stay on the span), baggage propagates to all downstream services automatically.

import { propagation, context, baggage } from '@opentelemetry/api';

// Set baggage at the API gateway
const bag = propagation.createBaggage({
    'tenant.id': { value: 'tenant_acme' },
    'request.priority': { value: 'high' },
    'feature.flags': { value: 'new-checkout,beta-search' },
});
const ctx = propagation.setBaggage(context.active(), bag);

// Downstream services can read baggage
const tenantId = propagation.getBaggage(context.active())?.getEntry('tenant.id')?.value;

Useful for:

Tenant ID propagation (every downstream service knows which tenant)
Feature flags (propagate experiment assignments across services)
Priority routing (high-priority requests get different queue treatment)
Debug markers (mark specific requests for verbose logging)

Baggage travels with the trace context in HTTP headers and message metadata. Every service that extracts the trace context also gets the baggage.

Collector Architecture

The OpenTelemetry Collector is the central routing layer between your applications and your observability backends.

┌─────────────┐  ┌─────────────┐  ┌─────────────┐
│  Service A   │  │  Service B   │  │  Worker C    │
│  (OTLP gRPC) │  │  (OTLP HTTP) │  │  (OTLP gRPC) │
└──────┬───────┘  └──────┬───────┘  └──────┬───────┘
       │                 │                 │
       ▼                 ▼                 ▼
┌─────────────────────────────────────────────────┐
│              OTel Collector                      │
│                                                  │
│  Receivers: OTLP (gRPC + HTTP)                  │
│  Processors: batch, tail_sampling, attributes    │
│  Exporters: Tempo, Prometheus, Loki             │
└─────────────────────────────────────────────────┘
       │                 │                 │
       ▼                 ▼                 ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│  Grafana     │ │  Prometheus  │ │  Grafana     │
│  Tempo       │ │              │ │  Loki        │
│  (traces)    │ │  (metrics)   │ │  (logs)      │
└──────────────┘ └──────────────┘ └──────────────┘

The collector handles batching (reduces network calls), sampling (reduces storage), attribute processing (adds/removes attributes), and routing (different signals to different backends). Deploy it as a sidecar or as a central service depending on your infrastructure.

For cloud deployment patterns including observability infrastructure, that page covers our approach.

Common Pitfalls

No context propagation across queues. HTTP propagation is automatic. Queue propagation is not. If you don't inject/extract context in message headers, traces break at every queue boundary.
PII in span attributes. Your tracing backend indexes everything. If spans contain emails, names, or request bodies, your Grafana Tempo cluster is a PII store.
Tracing every request in production. At 1000 RPS, full tracing generates terabytes of data. Use tail-based sampling to keep errors, slow traces, and expensive operations.
No LLM-specific attributes. Without token counts, cost, model ID, and finish reason on LLM spans, you can't track AI costs or diagnose quality issues.
Head-based sampling dropping errors. If you sample 10% of traces and an error happens in the 90% you drop, you never see it. Use tail-based sampling or always-sample-errors policies.
Baggage for large payloads. Baggage travels with every request. Large values increase header size on every HTTP call. Keep baggage values small (IDs, flags, priorities).

Key Takeaways

Context propagation across queues is the hardest part. HTTP is automatic. Queues require manual inject/extract of trace context in message headers. This is where most distributed tracing implementations break.
Trace LLM calls with cost and token attributes. Model, provider, token counts, cost, finish reason. These attributes enable AI cost dashboards and quality monitoring.
Tail-based sampling for AI workloads. Keep all errors, slow traces, and expensive operations. Drop routine traces. Reduces storage by 90%+ while keeping all interesting data.
No PII in spans. Opaque user IDs, entity counts, token types. Never raw content, emails, names, or request bodies.
Baggage propagates tenant context. Set tenant ID, feature flags, and priority at the edge. Every downstream service reads it from baggage without explicit parameter passing.

We implement OpenTelemetry across our AI services, custom software, and cloud infrastructure. If you're building observability for a distributed system, talk to our team or request a quote.

Topics covered

OpenTelemetry productiondistributed tracingtrace context propagationobservability architectureOTel patternsOpenTelemetry LLMOpenTelemetry queues

Ready to build production AI systems?

Our team specializes in building production-ready AI systems. Let's discuss how we can help transform your enterprise with cutting-edge technology.

Start a conversation

OpenTelemetry in Production: Traces, Context, and What Actually Matters

Why OpenTelemetry Won

Context Propagation: The Hardest Part

HTTP Propagation (Easy)

Queue Propagation (Hard)

Worker Process Propagation

Tracing LLM Calls

LLM Span Attributes Convention

Sampling Strategies

Head-Based Sampling

Tail-Based Sampling (Recommended for AI)

Privacy-Safe Spans

The Baggage API

Collector Architecture

Common Pitfalls

Key Takeaways

Topics covered

Related Guides

Observability That Helps at 3am: Logs, Traces, and What Actually Matters

Enterprise Guide to Agentic AI Systems

Agentic Commerce: How to Let AI Agents Buy Things Safely

Ready to build production AI systems?

Get the Latest AI Insights

Services

Solutions

Company

Resources

Legal