OpenTelemetry in Production: Traces, Context, and What Actually Matters
Production OpenTelemetry patterns. Context propagation across queues and workers, tracing LLM calls, sampling strategies for AI workloads, privacy-safe spans, and the baggage API.
Why OpenTelemetry Won
Three years ago, the observability landscape was fragmented. Jaeger for tracing, Prometheus for metrics, Fluentd for logs, each with its own SDK, its own protocol, its own vendor lock-in. OpenTelemetry unified them into a single standard: one SDK, one protocol (OTLP), one collector that routes to any backend.
We adopted OpenTelemetry across our production systems. This article covers the patterns that actually matter in production, not the setup tutorial. For the broader observability strategy (when to alert, what to log, how to structure metrics), see our AI observability guide. This article goes deeper on OpenTelemetry-specific implementation.
Context Propagation: The Hardest Part
A single user request might cross 5 services, 3 message queues, 2 worker processes, and an LLM call. Context propagation ensures that the trace follows the request across all of them.
HTTP Propagation (Easy)
OpenTelemetry auto-instruments HTTP clients and servers. The trace context propagates via traceparent and tracestate headers. This works out of the box.
// Auto-instrumented: no code needed for HTTP propagation
// The SDK adds traceparent header to outgoing requests
// The receiving service extracts it and continues the trace
Queue Propagation (Hard)
Message queues break automatic propagation. When you enqueue a message, the trace context must be serialized into the message headers. When a worker dequeues it, the context must be extracted and the trace continued.
// Producer: inject trace context into message headers
import { context, propagation } from '@opentelemetry/api';
async function enqueueMessage(queue: string, payload: any) {
const carrier: Record<string, string> = {};
propagation.inject(context.active(), carrier);
await messageQueue.send(queue, {
body: payload,
headers: carrier, // Contains traceparent, tracestate
});
}
// Consumer: extract trace context from message headers
async function processMessage(message: QueueMessage) {
const parentContext = propagation.extract(context.active(), message.headers);
await context.with(parentContext, async () => {
const span = tracer.startSpan('process_message', {
attributes: {
'messaging.system': 'rabbitmq',
'messaging.operation': 'process',
'messaging.destination': message.queue,
},
});
try {
await handleMessage(message.body);
span.setStatus({ code: SpanStatusCode.OK });
} catch (error) {
span.setStatus({ code: SpanStatusCode.ERROR, message: error.message });
throw error;
} finally {
span.end();
}
});
}
This pattern works for RabbitMQ, Kafka, BullMQ, SQS, and Symfony Messenger. The message headers carry the trace context. The consumer extracts it and creates child spans under the original trace.
Worker Process Propagation
For Vendure's BullMQ workers, Pimcore's Symfony Messenger workers, and similar background job systems, the pattern is the same: serialize context into the job payload, extract on the worker side.
// BullMQ: add trace context to job data
async function addJob(queue: Queue, data: any) {
const carrier: Record<string, string> = {};
propagation.inject(context.active(), carrier);
await queue.add('process', {
...data,
_traceContext: carrier,
});
}
// BullMQ: extract trace context in worker
worker.on('process', async (job) => {
const parentContext = propagation.extract(context.active(), job.data._traceContext || {});
await context.with(parentContext, async () => {
const span = tracer.startSpan(`job:${job.name}`);
try {
await processJob(job.data);
} finally {
span.end();
}
});
});
Tracing LLM Calls
LLM calls are the most expensive operations in AI systems. Tracing them with proper attributes enables cost tracking, latency analysis, and quality monitoring.
async function tracedLlmCall(prompt: string, options: LlmOptions): Promise<string> {
const span = tracer.startSpan('llm.generate', {
attributes: {
'llm.provider': options.provider, // "openai", "anthropic"
'llm.model': options.model, // "gpt-4o", "claude-sonnet-4-20250514"
'llm.temperature': options.temperature,
'llm.max_tokens': options.maxTokens,
'llm.prompt_tokens': estimateTokens(prompt), // estimate before call
},
});
try {
const response = await llmClient.generate(prompt, options);
span.setAttributes({
'llm.response_tokens': response.usage.completionTokens,
'llm.total_tokens': response.usage.totalTokens,
'llm.finish_reason': response.finishReason,
'llm.cost_usd': calculateCost(response.usage, options.model),
});
span.setStatus({ code: SpanStatusCode.OK });
return response.text;
} catch (error) {
span.setStatus({ code: SpanStatusCode.ERROR, message: error.message });
span.setAttribute('llm.error_type', error.constructor.name);
throw error;
} finally {
span.end();
}
}
Do NOT put prompt text in span attributes. Prompts contain PII. Span attributes are sent to your observability backend (Jaeger, Grafana Tempo, Datadog). Instead, log the prompt hash or token count. For the full PII-safe logging architecture, see our AI data leakage guide.
LLM Span Attributes Convention
| Attribute | Type | Example |
|---|---|---|
llm.provider | string | "openai" |
llm.model | string | "gpt-4o" |
llm.temperature | float | 0.7 |
llm.prompt_tokens | int | 1250 |
llm.response_tokens | int | 340 |
llm.total_tokens | int | 1590 |
llm.finish_reason | string | "stop" |
llm.cost_usd | float | 0.023 |
llm.error_type | string | "RateLimitError" |
llm.cache_hit | boolean | false |
Sampling Strategies
In high-volume AI systems, tracing every request is too expensive. Sampling reduces volume while preserving visibility into important traces.
Head-Based Sampling
Decide at the start of the trace whether to sample it. Simple but lossy.
// Sample 10% of all traces
const sampler = new TraceIdRatioBased(0.1);
// Always sample errors (override ratio for error traces)
const compositeSampler = new ParentBasedSampler({
root: new TraceIdRatioBased(0.1),
// Errors always sampled via span processor
});
Tail-Based Sampling (Recommended for AI)
Decide after the trace completes whether to keep it. Keeps all interesting traces (errors, slow responses, high cost) and drops routine ones.
// OpenTelemetry Collector: tail-based sampling config
processors:
tail_sampling:
decision_wait: 10s
policies:
# Keep all errors
- name: errors
type: status_code
status_code: { status_codes: [ERROR] }
# Keep slow traces (> 5 seconds)
- name: slow
type: latency
latency: { threshold_ms: 5000 }
# Keep expensive LLM calls (> $0.10)
- name: expensive_llm
type: string_attribute
string_attribute:
key: llm.cost_usd
values: [] # Custom: filter in pipeline
enabled_regex_matching: true
# Sample 5% of everything else
- name: baseline
type: probabilistic
probabilistic: { sampling_percentage: 5 }
Tail-based sampling requires the OpenTelemetry Collector. The collector buffers complete traces, evaluates policies, and forwards only sampled traces to the backend. This adds latency (the decision_wait period) but dramatically reduces storage costs while keeping all interesting data.
Privacy-Safe Spans
Span attributes, span names, and span events are all sent to your observability backend. If any of these contain PII, your tracing infrastructure becomes a data protection liability.
// BAD: PII in span attributes
span.setAttribute('user.email', 'sara.mustermann@beispiel.de');
span.setAttribute('user.name', 'Sara Mustermann');
span.setAttribute('request.body', JSON.stringify(requestBody)); // Contains PII
// GOOD: token IDs and aggregate data only
span.setAttribute('user.id', 'usr_abc123'); // Opaque ID, not PII
span.setAttribute('entities.detected', 3);
span.setAttribute('entities.types', ['person', 'email', 'phone']);
span.setAttribute('policy.applied', 'german-support');
Rules for privacy-safe tracing:
- User IDs: opaque identifiers only (not emails, not names)
- Request bodies: never include raw content. Log entity counts and types.
- LLM prompts: never include. Log token counts and prompt hash.
- Error messages: sanitize before attaching to spans. Strip any user data.
The Baggage API
OpenTelemetry Baggage carries key-value pairs across service boundaries. Unlike span attributes (which stay on the span), baggage propagates to all downstream services automatically.
import { propagation, context, baggage } from '@opentelemetry/api';
// Set baggage at the API gateway
const bag = propagation.createBaggage({
'tenant.id': { value: 'tenant_acme' },
'request.priority': { value: 'high' },
'feature.flags': { value: 'new-checkout,beta-search' },
});
const ctx = propagation.setBaggage(context.active(), bag);
// Downstream services can read baggage
const tenantId = propagation.getBaggage(context.active())?.getEntry('tenant.id')?.value;
Useful for:
- Tenant ID propagation (every downstream service knows which tenant)
- Feature flags (propagate experiment assignments across services)
- Priority routing (high-priority requests get different queue treatment)
- Debug markers (mark specific requests for verbose logging)
Baggage travels with the trace context in HTTP headers and message metadata. Every service that extracts the trace context also gets the baggage.
Collector Architecture
The OpenTelemetry Collector is the central routing layer between your applications and your observability backends.
βββββββββββββββ βββββββββββββββ βββββββββββββββ
β Service A β β Service B β β Worker C β
β (OTLP gRPC) β β (OTLP HTTP) β β (OTLP gRPC) β
ββββββββ¬ββββββββ ββββββββ¬ββββββββ ββββββββ¬ββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β OTel Collector β
β β
β Receivers: OTLP (gRPC + HTTP) β
β Processors: batch, tail_sampling, attributes β
β Exporters: Tempo, Prometheus, Loki β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β β β
βΌ βΌ βΌ
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
β Grafana β β Prometheus β β Grafana β
β Tempo β β β β Loki β
β (traces) β β (metrics) β β (logs) β
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
The collector handles batching (reduces network calls), sampling (reduces storage), attribute processing (adds/removes attributes), and routing (different signals to different backends). Deploy it as a sidecar or as a central service depending on your infrastructure.
For cloud deployment patterns including observability infrastructure, that page covers our approach.
Common Pitfalls
-
No context propagation across queues. HTTP propagation is automatic. Queue propagation is not. If you don't inject/extract context in message headers, traces break at every queue boundary.
-
PII in span attributes. Your tracing backend indexes everything. If spans contain emails, names, or request bodies, your Grafana Tempo cluster is a PII store.
-
Tracing every request in production. At 1000 RPS, full tracing generates terabytes of data. Use tail-based sampling to keep errors, slow traces, and expensive operations.
-
No LLM-specific attributes. Without token counts, cost, model ID, and finish reason on LLM spans, you can't track AI costs or diagnose quality issues.
-
Head-based sampling dropping errors. If you sample 10% of traces and an error happens in the 90% you drop, you never see it. Use tail-based sampling or always-sample-errors policies.
-
Baggage for large payloads. Baggage travels with every request. Large values increase header size on every HTTP call. Keep baggage values small (IDs, flags, priorities).
Key Takeaways
-
Context propagation across queues is the hardest part. HTTP is automatic. Queues require manual inject/extract of trace context in message headers. This is where most distributed tracing implementations break.
-
Trace LLM calls with cost and token attributes. Model, provider, token counts, cost, finish reason. These attributes enable AI cost dashboards and quality monitoring.
-
Tail-based sampling for AI workloads. Keep all errors, slow traces, and expensive operations. Drop routine traces. Reduces storage by 90%+ while keeping all interesting data.
-
No PII in spans. Opaque user IDs, entity counts, token types. Never raw content, emails, names, or request bodies.
-
Baggage propagates tenant context. Set tenant ID, feature flags, and priority at the edge. Every downstream service reads it from baggage without explicit parameter passing.
We implement OpenTelemetry across our AI services, custom software, and cloud infrastructure. If you're building observability for a distributed system, talk to our team or request a quote.
Topics covered
Related Guides
Observability That Helps at 3am: Logs, Traces, and What Actually Matters
Production observability beyond dashboards. Structured logging, correlation IDs, PII-safe logs, alert fatigue prevention, cost management, and the observability maturity model.
Read guideEnterprise Guide to Agentic AI Systems
Technical guide to agentic AI systems in enterprise environments. Learn the architecture, capabilities, and applications of autonomous AI agents.
Read guideAgentic Commerce: How to Let AI Agents Buy Things Safely
How to design governed AI agent-initiated commerce. Policy engines, HITL approval gates, HMAC receipts, idempotency, tenant scoping, and the full Agentic Checkout Protocol.
Read guideReady to build production AI systems?
Our team specializes in building production-ready AI systems. Let's discuss how we can help transform your enterprise with cutting-edge technology.
Start a conversation