← System Design Backend Architectures
System Design

CQRS Pattern

CQRS separates the write path (commands) from the read path (queries) so each can be optimized independently — write models use normalized schemas for transactional integrity; read models use denormalized projections updated asynchronously via an event bus.

TL;DR
  • CQRS separates the write path from the read path so each optimizes independently. The write model uses normalized schemas for transactional integrity; the read model uses denormalized projections for fast retrieval.
  • Bol.com processes 250M+ orders/year with CQRS using Axon Framework: PostgreSQL for writes (order state transitions), Elasticsearch for reads (sub-100ms search across 250M historical orders). Projection lag: 50–200ms under normal load.
  • Stripe's merchant dashboard uses CQRS: the write path appends charge events to PostgreSQL shards; the read path pre-aggregates balance summaries per merchant updated via Kafka. Without CQRS, every dashboard load triggers a SUM across millions of transaction rows.
  • Projection lag is the primary UX risk. A user who submits a form and immediately sees their pre-change state assumes their write failed. Design for it explicitly: optimistic UI updates or a brief polling loop after writes.
  • Apply CQRS surgically to high-contention subdomains only. Applying it globally adds event brokers, projectors, and multiple databases to every CRUD operation for no benefit.

The Problem

An e-commerce platform's product search is causing write failures. The search feature requires a multi-column index on the product catalog table (indexed by name, category, price, rating, availability). Inserting a new product now takes 800ms because PostgreSQL must update all five indexes simultaneously. Meanwhile, the checkout flow is timing out on row-level locks — product availability reads for the cart are contending with the inventory write transaction that decrements stock. The same table, the same schema, the same database instance, and the same indexes are being optimized for two completely different access patterns — and neither is winning.

Core System Idea

CQRS splits the application into two distinct paths with independent data models: (1) Command path — handles all operations that mutate state. Uses a normalized write model optimized for transactional consistency and write throughput. Write model tables have minimal indexes (primary key + foreign keys only) — no read-optimized composite indexes. Commands return a correlation ID or success/failure, never the projected state. (2) Query path — handles all read operations. Uses a denormalized read model stored in a database chosen for the specific query pattern: Elasticsearch for text search, Redis for key-value lookups, Cassandra for time-series access, PostgreSQL with materialized views for aggregations. The read model is pre-computed and indexed exactly for its queries — no runtime joins. The bridge between the two: an event bus (Kafka, RabbitMQ) carries domain events from the write database to projection workers. A projection worker consumes domain events and updates the read model. Projection lag — the delay between a write being committed and the read model reflecting it — is typically 50ms–5s. This eventual consistency must be designed for explicitly in the UI. The projection worker must be idempotent: processing the same event twice must produce the same read model state.

System Flow

flowchart TD A["Client"] -- "1. Send Command" --> B["Command API"] A -- "4. Query Data" --> C["Query API"] B -- "2. Update" --> D[("Write DB")] D -- "Publish Event" --> E["Event Bus"] E -- "Consume" --> F["Projection Worker"] F -- "3. Update Read Model" --> G[("Read DB")] G -- "Fetch" --> C

Commands update the normalized write database and publish events; projection workers consume events and update the denormalized read database asynchronously.

Real-World Examples Indicative

Bol.com order management with Axon Framework

Bol.com (Netherlands' largest e-commerce, processing 250M+ orders/year) uses Axon Framework (Java CQRS library) for order management. Write side: PostgreSQL handles order state transitions (placed → paid → shipped → delivered) with optimistic locking — the write model has 4 tables with only primary key and foreign key indexes. Read side: Elasticsearch indexes order data denormalized per order, enabling sub-100ms search across 250M+ historical orders by customer, product, date range, or shipping status. Axon's TrackingEventProcessor monitors projection lag — when lag exceeds 500ms (indicating the projection worker cannot keep up with write throughput), it alerts the on-call team. During COVID-19 peak in 2020, write throughput reached 10K orders/minute while Elasticsearch absorbed 500K+ status queries/minute on completely isolated hardware.

Stripe merchant dashboard read model

Stripe's merchant dashboard (balance, recent transactions, payout timeline) serves a read model separate from the write path. The write path appends charge, refund, and payout events to PostgreSQL shards — each shard handling one range of merchant IDs. The read path serves pre-aggregated balance-by-currency per merchant, stored in a separate dashboard database updated via Kafka events from the write path. Without CQRS, a merchant with 5M transactions would require SELECT SUM(amount) FROM charges WHERE merchant_id=... AND currency='USD' — scanning 5M rows on every dashboard load. With CQRS, the dashboard reads a single pre-aggregated row updated incrementally when each charge event is consumed. The tradeoff: Stripe's dashboard shows balance "as of a few seconds ago" — the projection lag is surfaced to merchants in the UI as a refresh timestamp.

Discord read model for message history

Discord stores messages in Cassandra (write-optimized, append-only, sorted by channel ID + timestamp). For features requiring cross-channel aggregation — server-wide search, jump-to-message, analytics — Discord projects Cassandra events into Elasticsearch (read model). The projection is bounded: only the last 180 days of messages per server are indexed in Elasticsearch to contain index size and maintain sub-100ms search latency. Projection lag during normal operation: <1 second. When Elasticsearch lag exceeds 5 seconds (e.g., during cluster reindexing), Discord surfaces a "search may be delayed" notice in the UI — explicit user communication of the eventual consistency window rather than silently serving stale results.

Anti-Patterns

Applying CQRS globally

Implementing CQRS for user profile settings, feature flags, or admin configuration — simple CRUD entities with low traffic and no read/write contention. CQRS adds event brokers, projection workers, multiple databases, and idempotency logic. Apply it only to the specific subdomains where read/write contention is measurable.

Synchronous read model updates

Updating the read database inside the write transaction before returning to the caller. This defeats the isolation between models — a slow Elasticsearch index update now blocks the write transaction, and if Elasticsearch is down, the write fails.

No projection lag handling in the UI

Redirecting users to a list view immediately after a successful write, where the new item does not yet appear because the projection has not processed the event. Use optimistic UI (insert the item client-side) or add a brief polling loop after writes to detect when the read model has caught up.

Single database instance for both models

Running the write and read models on the same PostgreSQL instance. Read query CPU and I/O still contend with write transactions — the two models are logically separated but physically co-located on the same resource they were competing for.

Design Tradeoffs

DimensionSeparate Read/Write ModelsShared CRUD Model
Read performancePre-aggregated, denormalized: simple indexed lookups, sub-100ms regardless of data volumeRuntime joins and aggregations: performance degrades as rows accumulate
Write performanceWrite model has minimal indexes — inserts are fast with no index fan-out to read-optimized columnsHeavy composite indexes required for reads slow down write operations
ConsistencyEventual: projection lag of 50ms–5s typical; must be communicated to usersImmediate: read-your-own-writes guaranteed without additional UI patterns
ComplexityHigh: event bus, projection workers, multiple databases, idempotent event handlersLow: standard ORM and MVC patterns; one database schema to design and migrate

Best Practices

Make projection workers idempotent: the same event processed twice must produce the same read model state. Use the event's sequence number or timestamp as an idempotency key — if the projection database already contains a record for this event, skip the update.
Monitor projection lag as a primary SLI. Alert when lag exceeds the acceptable consistency window (e.g., >500ms for user-facing reads, >5s for analytics reads). Lagging projectors indicate write throughput has exceeded projection worker capacity.
Match the read database to the query pattern: Elasticsearch for full-text search, Redis for sub-millisecond key-value access, ClickHouse for analytical aggregations, PostgreSQL materialized views for relational joins that are refreshed periodically.
Commands should return a correlation ID and a success acknowledgment, not the projected read state. Returning the full entity from a command requires running a read query against the write model — coupling the two paths at the transport layer.
Start with a single database and materialized views before introducing a full CQRS pipeline. Materialized views provide the denormalized read model without the event bus complexity — graduate to full CQRS only when write throughput prevents real-time view refresh.

When to Use / Avoid

Use WhenAvoid When
Read-to-write ratio is extremely high (1000:1+) and reads require complex aggregations across large datasetsSystem is a simple CRUD application with low traffic and queries that are fast at current data volume
Write model has complex business rules and validation while reads require pre-aggregated, denormalized views across different dimensionsStrict immediate consistency is required — the application cannot tolerate any projection lag between a write and its visibility in reads
Read and write operations have measurably different scaling requirements — different hardware, different database enginesTeam lacks experience with event-driven architectures and eventual consistency — the operational overhead will dominate engineering time