Observability · #40

Structured Logging

Structured logging formats log outputs as machine-readable JSON rather than arbitrary, unparsed text strings.

Published May 29, 2026 · By MortalApps · 4 min read · 886 words

TL;DR

Structured logging formats log outputs as machine-readable JSON rather than arbitrary, unparsed text strings.
Correlation IDs must be injected at the edge and propagated to every log statement to enable end-to-end request tracing.
Unbounded cardinality in log keys (e.g., raw SQL queries, stack traces in keys) degrades indexing performance and inflates storage costs.
Under extreme load, logging libraries must transition from blocking to dropping logs to protect application availability.

Problem Idea Flow Examples Anti-patterns Tradeoffs Best Practices Related

The Problem

Traditional unstructured text logs (e.g., User 456 failed to checkout: payment gateway timeout) are easy for humans to read but incredibly difficult for machines to parse at scale. When an incident occurs, engineers are forced to write complex, fragile regular expressions to extract variables like user IDs or latency metrics. If a developer changes a single character in the log string, the parsing rules break, rendering dashboards and alerts useless. Furthermore, without a shared correlation ID, there is no way to connect a log line in the API gateway with a corresponding log line in a downstream database service.

Core System Idea

Structured logging solves this by treating logs as structured event objects—typically serialized as JSON—rather than flat strings. Every log entry consists of a standard set of key-value pairs containing metadata (timestamp, service name, environment, log level) alongside context-specific fields (user ID, execution time, error codes).

To make these logs actionable across distributed systems, a unique Correlation ID (or Trace ID) is generated at the system boundary and injected into the execution context (e.g., thread-local storage or Go's context.Context). The logging library automatically extracts this context and appends it to every log statement.

To handle high-throughput scenarios without crashing the host, the logging framework writes to an in-memory ring buffer, which is flushed asynchronously to stdout or a local log shipper daemon.

System Flow

flowchart TD A[App Event Occurs] --> B["Log Library: Extract Context"] B --> C[Serialize to JSON] C --> D{"Ring Buffer Status"} D -- "Normal" --> E[Write to Buffer] D -- "Full: Overloaded" --> F["Drop Log / Emit Metric"] E --> G[Async Flush to stdout] G --> H[Log Shipper Daemon]

Application events are serialized to structured JSON, passed through a memory-bounded ring buffer, and flushed asynchronously to prevent blocking the main execution thread.

Real-World Examples Indicative

GitHub Haystack

GitHub injects a request_uuid at the load balancer edge via the X-GitHub-Request-Id response header. This UUID is propagated through every internal service call—authentication, git pack-file transfer, webhook fanout. Engineers type the UUID into GitHub's internal "Haystack" log search tool and immediately reconstruct every service touch for a single git push in chronological order, without writing any regex or joining tables. Mandatory fields on all API handler logs include request_uuid, method, path, status, duration_ms, and user_id.

Shopify Lograge

Shopify uses the lograge gem to convert Rails' verbose multi-line log output into single-line JSON events. Each log line includes method, path, status, duration, db_duration, view_duration, request_id, and shop_id. Switching to lograge reduced Shopify's log volume by ~40% compared to Rails default logging while enabling instant Kibana filtering by shop_id across 1.7M+ merchant API requests without regex parsing.

Cloudflare Workers Logging

Cloudflare's edge Workers runtime enforces a strict structured log schema at the platform level—developers cannot emit arbitrary strings. Every log event is automatically annotated with cf_ray (the unique request ID), cf_pop (data center), worker_name, duration_us, and outcome. This schema enforcement means Cloudflare's centralized Clickhouse-backed log store can answer cross-PoP queries like "all requests to worker X with duration > 100ms in the last 5 minutes" across 300+ PoPs without per-source parser configuration.

Anti-Patterns

Dynamic Keys

Generating JSON keys dynamically based on user input (e.g., {"user_input_key": "value"}). This leads to index mapping explosions in downstream log engines like Elasticsearch, eventually crashing the cluster.

Synchronous Disk I/O

Writing logs directly to disk or network sockets synchronously from the application thread. This introduces severe latency spikes and can halt the application if the disk fills up or the network lags.

Logging Sensitive PII

Including unencrypted passwords, credit card numbers, or personal data in structured fields. Once indexed, this data is highly visible and difficult to purge compliantly.

Using Logs for Metrics

Emitting high-frequency structured logs solely to calculate metrics (e.g., logging every single HTTP request to count total requests). This is extremely expensive; use lightweight counter metrics instead.

Design Tradeoffs

Dimension	Schema-on-Write (Strict JSON)	Schema-on-Read (Unstructured Text)
Query performance	Instant indexed queries; fields pre-parsed at write time enable O(1) key lookup across billions of events	Slow regex parsing at query time; performance degrades linearly with log volume and format variation
Application overhead	JSON serialization adds ~1-5μs per log event; bounded async ring buffer absorbs traffic bursts without blocking	Near-zero serialization overhead; raw string written directly to buffer with no transformation
Schema discipline	Requires enforced contracts; a missing mandatory field silently breaks dashboards and alert conditions	No upfront contract required; developers change log format freely but break downstream parsers implicitly

Best Practices

Enforce a Global SchemaDefine a core set of mandatory fields (e.g., timestamp, service, version, level, correlation_id) that every service must include, validated in CI.

Use Non-Blocking Asynchronous AppendersConfigure logging libraries to drop debug/info logs if the internal queue exceeds a safe memory threshold under heavy load rather than blocking the caller.

Inject Trace ContextAutomatically map OpenTelemetry trace and span IDs into your structured log fields to bridge the gap between traces and logs for a single correlation ID.

Sanitize at the SourceImplement interceptors or middleware in the logging library to automatically redact known sensitive keys (e.g., password, token, card_number) before serialization.

When to Use / Avoid

Use When	Avoid When
Operating multi-service architectures where logs must be aggregated, indexed, and searched programmatically.	Building small, single-instance CLI tools where human readability on stdout is the only requirement.
Building automated alerting and dashboarding systems based on log attributes.	Operating in resource-constrained IoT or embedded environments where JSON serialization overhead is too costly.
Compliance and auditing require strict tracking of user actions with verifiable correlation chains.	High-performance, low-latency hot paths (e.g., high-frequency trading engines) where every microsecond matters.