Structured Logging
Structured logging formats log outputs as machine-readable JSON rather than arbitrary, unparsed text strings.
- Structured logging formats log outputs as machine-readable JSON rather than arbitrary, unparsed text strings.
- Correlation IDs must be injected at the edge and propagated to every log statement to enable end-to-end request tracing.
- Unbounded cardinality in log keys (e.g., raw SQL queries, stack traces in keys) degrades indexing performance and inflates storage costs.
- Under extreme load, logging libraries must transition from blocking to dropping logs to protect application availability.
The Problem
Traditional unstructured text logs (e.g., User 456 failed to checkout: payment gateway timeout) are easy for humans to read but incredibly difficult for machines to parse at scale. When an incident occurs, engineers are forced to write complex, fragile regular expressions to extract variables like user IDs or latency metrics. If a developer changes a single character in the log string, the parsing rules break, rendering dashboards and alerts useless. Furthermore, without a shared correlation ID, there is no way to connect a log line in the API gateway with a corresponding log line in a downstream database service.
Core System Idea
Structured logging solves this by treating logs as structured event objects—typically serialized as JSON—rather than flat strings. Every log entry consists of a standard set of key-value pairs containing metadata (timestamp, service name, environment, log level) alongside context-specific fields (user ID, execution time, error codes).
To make these logs actionable across distributed systems, a unique Correlation ID (or Trace ID) is generated at the system boundary and injected into the execution context (e.g., thread-local storage or Go's context.Context). The logging library automatically extracts this context and appends it to every log statement.
To handle high-throughput scenarios without crashing the host, the logging framework writes to an in-memory ring buffer, which is flushed asynchronously to stdout or a local log shipper daemon.
System Flow
Application events are serialized to structured JSON, passed through a memory-bounded ring buffer, and flushed asynchronously to prevent blocking the main execution thread.
Real-World Examples Indicative
GitHub injects a request_uuid at the load balancer edge via the X-GitHub-Request-Id response header. This UUID is propagated through every internal service call—authentication, git pack-file transfer, webhook fanout. Engineers type the UUID into GitHub's internal "Haystack" log search tool and immediately reconstruct every service touch for a single git push in chronological order, without writing any regex or joining tables. Mandatory fields on all API handler logs include request_uuid, method, path, status, duration_ms, and user_id.
Shopify uses the lograge gem to convert Rails' verbose multi-line log output into single-line JSON events. Each log line includes method, path, status, duration, db_duration, view_duration, request_id, and shop_id. Switching to lograge reduced Shopify's log volume by ~40% compared to Rails default logging while enabling instant Kibana filtering by shop_id across 1.7M+ merchant API requests without regex parsing.
Cloudflare's edge Workers runtime enforces a strict structured log schema at the platform level—developers cannot emit arbitrary strings. Every log event is automatically annotated with cf_ray (the unique request ID), cf_pop (data center), worker_name, duration_us, and outcome. This schema enforcement means Cloudflare's centralized Clickhouse-backed log store can answer cross-PoP queries like "all requests to worker X with duration > 100ms in the last 5 minutes" across 300+ PoPs without per-source parser configuration.
Anti-Patterns
Generating JSON keys dynamically based on user input (e.g., {"user_input_key": "value"}). This leads to index mapping explosions in downstream log engines like Elasticsearch, eventually crashing the cluster.
Writing logs directly to disk or network sockets synchronously from the application thread. This introduces severe latency spikes and can halt the application if the disk fills up or the network lags.
Including unencrypted passwords, credit card numbers, or personal data in structured fields. Once indexed, this data is highly visible and difficult to purge compliantly.
Emitting high-frequency structured logs solely to calculate metrics (e.g., logging every single HTTP request to count total requests). This is extremely expensive; use lightweight counter metrics instead.
Design Tradeoffs
| Dimension | Schema-on-Write (Strict JSON) | Schema-on-Read (Unstructured Text) |
|---|---|---|
| Query performance | Instant indexed queries; fields pre-parsed at write time enable O(1) key lookup across billions of events | Slow regex parsing at query time; performance degrades linearly with log volume and format variation |
| Application overhead | JSON serialization adds ~1-5μs per log event; bounded async ring buffer absorbs traffic bursts without blocking | Near-zero serialization overhead; raw string written directly to buffer with no transformation |
| Schema discipline | Requires enforced contracts; a missing mandatory field silently breaks dashboards and alert conditions | No upfront contract required; developers change log format freely but break downstream parsers implicitly |
Best Practices
timestamp, service, version, level, correlation_id) that every service must include, validated in CI.password, token, card_number) before serialization.When to Use / Avoid
| Use When | Avoid When |
|---|---|
| Operating multi-service architectures where logs must be aggregated, indexed, and searched programmatically. | Building small, single-instance CLI tools where human readability on stdout is the only requirement. |
| Building automated alerting and dashboarding systems based on log attributes. | Operating in resource-constrained IoT or embedded environments where JSON serialization overhead is too costly. |
| Compliance and auditing require strict tracking of user actions with verifiable correlation chains. | High-performance, low-latency hot paths (e.g., high-frequency trading engines) where every microsecond matters. |