← System Design Observability
System Design

Distributed Tracing

Distributed tracing tracks requests across service boundaries by propagating a unique trace context through transport headers.

TL;DR
  • Distributed tracing tracks requests across service boundaries by propagating a unique trace context through transport headers.
  • Head-based sampling makes decisions at the ingress gateway, while tail-based sampling buffers traces to make decisions after the trace completes.
  • High-volume trace storage is cost-prohibitive, requiring aggressive sampling, attribute limits, and TTL-based retention tiers.
  • Context propagation must be standardized across all internal and external network hops to prevent broken trace graphs.

The Problem

In a microservices architecture, a single user request can trigger dozens of downstream calls across multiple services, databases, and message queues. When a request fails or experiences high latency, traditional single-host logs are useless because they lack a unified correlation key across network boundaries. Engineers waste hours manually stitching together timestamps from different machines, trying to reconstruct the execution path. Without a reliable way to pass execution context across thread boundaries and network protocols, debugging cascading failures in production becomes an exercise in guesswork.

Core System Idea

The core architecture of distributed tracing relies on context propagation and structured span collection. A trace represents the entire lifecycle of a request, composed of directed acyclic graphs of "spans" (individual units of work).

To correlate these spans, the system injects metadata—specifically a Trace ID, Span ID, and sampling flags—into outbound network request headers (such as HTTP headers or gRPC metadata) using standardized formats like W3C Trace Context. Downstream services extract this context, create child spans, and pass the updated context further down the chain.

To manage the massive volume of generated telemetry, the system employs sampling strategies. Head-based sampling decides whether to keep a trace at the initial entry point, which is simple but misses rare errors. Tail-based sampling collects all spans in an in-memory buffer at a collector tier, evaluates the entire trace (e.g., "did any span error or take >500ms?"), and then decides whether to write it to persistent storage.

System Flow

flowchart TD A[Client Request] --> B["API Gateway: Inject Context"] B --> C["Service A: Create Span"] C --> D["Service B: Create Span"] C --> E["Collector: Buffer Spans"] D --> E E --> F{"Tail-Based Sampler"} F -- "Error or Slow" --> G[Persistent DB] F -- "Normal: Dropped" --> H[Discard]

Trace context propagates downstream while individual spans are sent asynchronously to a collector tier for tail-based sampling evaluation.

Real-World Examples Indicative

Uber Jaeger

Uber open-sourced Jaeger in 2016 after building it to handle tracing across 4,700+ microservices. Jaeger's adaptive sampler polls a remote sampling service every 60 seconds to adjust per-operation rates, targeting 1 sampled trace/second per operation regardless of actual RPS. At Uber's scale, a high-traffic endpoint at 50,000 RPS is sampled at 0.002% while a rare admin operation stays at 100%—keeping storage linear with service count, not request volume. Traces are stored in Cassandra for hot queries and Elasticsearch for attribute-based search.

Netflix Edgar

Netflix's internal tracing system (built on Zipkin) captures 100% of traces for requests tagged with the X-Netflix-Test-Actor header, attached to specific canary users or test accounts. This lets the Video Quality team replay the exact sequence of 40+ microservice calls during a playback failure—identifying which CDN manifest fetch timed out at which millisecond offset in the session. For production traffic, Edgar samples at 1-10% using probabilistic head-based sampling, with 72-hour hot retention before traces are dropped.

Honeycomb Dynamic Sampling

Honeycomb's dynsampler-go library evaluates each completed trace against a rule set: error_count > 0 retains 100% of traces, status=200 retains 1-in-100. Their BubbleUp feature automatically surfaces the shared attribute across anomalous traces—for example, flagging that customer_tier=free or db_shard=7 explains a P99 latency spike—without engineers writing any query. Slack, Twilio, and LaunchDarkly use Honeycomb to process high-cardinality trace data at tens of billions of events per day.

Anti-Patterns

Trace-Everything in Production

Attempting to store 100% of traces at high throughput. This rapidly leads to the tracing infrastructure costing more than the primary application database.

Manual Context Propagation

Relying on developers to manually pass trace headers in application code. This inevitably leads to broken traces when a developer forgets to forward headers in a new service or library.

High-Cardinality Span Attributes

Attaching raw, unbounded data (like full SQL queries, raw request bodies, or unique user IDs) as span tags. This bloats memory on the application host and crashes the downstream indexing engine.

Synchronous Span Exporting

Blocking the application's main execution thread to send span data to the collector. Span exporting must always occur asynchronously over UDP or non-blocking TCP/gRPC.

Design Tradeoffs

DimensionHead-Based SamplingTail-Based Sampling
Memory overheadLow; decisions are made instantly at the gateway without buffering any spansHigh; collector must buffer all spans in RAM for seconds until the full trace completes
Error captureMisses rare errors when sampling rate is low (e.g., 1% rate misses 99% of single-occurrence error traces)Captures 100% of error and slow traces by evaluating the full completed trace before deciding
Operational complexitySimple; no collector clustering or Trace-ID-based routing requiredComplex; all spans for the same Trace ID must route to the same collector node for assembly

Best Practices

Standardize on W3C Trace ContextUse industry-standard headers (traceparent, tracestate) to ensure compatibility across third-party proxies, API gateways, and cloud providers.
Implement Memory-Limited BufferingEnsure the in-process tracer uses a bounded ring buffer; if the buffer fills up during a network partition, drop spans rather than exhausting application memory.
Leverage Auto-InstrumentationUse runtime agents or service mesh sidecars to automatically inject and extract trace context at the framework level, minimizing custom code.
Enforce Span LimitsSet strict limits on the maximum number of spans per trace and the maximum size of span attributes to prevent runaway loops from crashing the collector.

When to Use / Avoid

Use WhenAvoid When
Operating complex microservice architectures with deep call graphs and asynchronous event-driven queues.Running a simple monolithic application where local stack traces and structured logs are sufficient.
Debugging distributed latency bottlenecks and identifying which downstream service is stalling.Operating under extreme CPU/memory constraints (e.g., embedded systems) where telemetry overhead is unacceptable.
Correlating errors across multiple network hops and infrastructure boundaries.High-throughput, simple batch processing systems where tracing adds no diagnostic value over metrics.