CQRS Pattern
CQRS separates the write path (commands) from the read path (queries) so each can be optimized independently — write models use normalized schemas for transactional integrity; read models use denormalized projections updated asynchronously via an event bus.
- CQRS separates the write path from the read path so each optimizes independently. The write model uses normalized schemas for transactional integrity; the read model uses denormalized projections for fast retrieval.
- Bol.com processes 250M+ orders/year with CQRS using Axon Framework: PostgreSQL for writes (order state transitions), Elasticsearch for reads (sub-100ms search across 250M historical orders). Projection lag: 50–200ms under normal load.
- Stripe's merchant dashboard uses CQRS: the write path appends charge events to PostgreSQL shards; the read path pre-aggregates balance summaries per merchant updated via Kafka. Without CQRS, every dashboard load triggers a SUM across millions of transaction rows.
- Projection lag is the primary UX risk. A user who submits a form and immediately sees their pre-change state assumes their write failed. Design for it explicitly: optimistic UI updates or a brief polling loop after writes.
- Apply CQRS surgically to high-contention subdomains only. Applying it globally adds event brokers, projectors, and multiple databases to every CRUD operation for no benefit.
The Problem
An e-commerce platform's product search is causing write failures. The search feature requires a multi-column index on the product catalog table (indexed by name, category, price, rating, availability). Inserting a new product now takes 800ms because PostgreSQL must update all five indexes simultaneously. Meanwhile, the checkout flow is timing out on row-level locks — product availability reads for the cart are contending with the inventory write transaction that decrements stock. The same table, the same schema, the same database instance, and the same indexes are being optimized for two completely different access patterns — and neither is winning.
Core System Idea
CQRS splits the application into two distinct paths with independent data models: (1) Command path — handles all operations that mutate state. Uses a normalized write model optimized for transactional consistency and write throughput. Write model tables have minimal indexes (primary key + foreign keys only) — no read-optimized composite indexes. Commands return a correlation ID or success/failure, never the projected state. (2) Query path — handles all read operations. Uses a denormalized read model stored in a database chosen for the specific query pattern: Elasticsearch for text search, Redis for key-value lookups, Cassandra for time-series access, PostgreSQL with materialized views for aggregations. The read model is pre-computed and indexed exactly for its queries — no runtime joins. The bridge between the two: an event bus (Kafka, RabbitMQ) carries domain events from the write database to projection workers. A projection worker consumes domain events and updates the read model. Projection lag — the delay between a write being committed and the read model reflecting it — is typically 50ms–5s. This eventual consistency must be designed for explicitly in the UI. The projection worker must be idempotent: processing the same event twice must produce the same read model state.
System Flow
Commands update the normalized write database and publish events; projection workers consume events and update the denormalized read database asynchronously.
Real-World Examples Indicative
Bol.com (Netherlands' largest e-commerce, processing 250M+ orders/year) uses Axon Framework (Java CQRS library) for order management. Write side: PostgreSQL handles order state transitions (placed → paid → shipped → delivered) with optimistic locking — the write model has 4 tables with only primary key and foreign key indexes. Read side: Elasticsearch indexes order data denormalized per order, enabling sub-100ms search across 250M+ historical orders by customer, product, date range, or shipping status. Axon's TrackingEventProcessor monitors projection lag — when lag exceeds 500ms (indicating the projection worker cannot keep up with write throughput), it alerts the on-call team. During COVID-19 peak in 2020, write throughput reached 10K orders/minute while Elasticsearch absorbed 500K+ status queries/minute on completely isolated hardware.
Stripe's merchant dashboard (balance, recent transactions, payout timeline) serves a read model separate from the write path. The write path appends charge, refund, and payout events to PostgreSQL shards — each shard handling one range of merchant IDs. The read path serves pre-aggregated balance-by-currency per merchant, stored in a separate dashboard database updated via Kafka events from the write path. Without CQRS, a merchant with 5M transactions would require SELECT SUM(amount) FROM charges WHERE merchant_id=... AND currency='USD' — scanning 5M rows on every dashboard load. With CQRS, the dashboard reads a single pre-aggregated row updated incrementally when each charge event is consumed. The tradeoff: Stripe's dashboard shows balance "as of a few seconds ago" — the projection lag is surfaced to merchants in the UI as a refresh timestamp.
Discord stores messages in Cassandra (write-optimized, append-only, sorted by channel ID + timestamp). For features requiring cross-channel aggregation — server-wide search, jump-to-message, analytics — Discord projects Cassandra events into Elasticsearch (read model). The projection is bounded: only the last 180 days of messages per server are indexed in Elasticsearch to contain index size and maintain sub-100ms search latency. Projection lag during normal operation: <1 second. When Elasticsearch lag exceeds 5 seconds (e.g., during cluster reindexing), Discord surfaces a "search may be delayed" notice in the UI — explicit user communication of the eventual consistency window rather than silently serving stale results.
Anti-Patterns
Implementing CQRS for user profile settings, feature flags, or admin configuration — simple CRUD entities with low traffic and no read/write contention. CQRS adds event brokers, projection workers, multiple databases, and idempotency logic. Apply it only to the specific subdomains where read/write contention is measurable.
Updating the read database inside the write transaction before returning to the caller. This defeats the isolation between models — a slow Elasticsearch index update now blocks the write transaction, and if Elasticsearch is down, the write fails.
Redirecting users to a list view immediately after a successful write, where the new item does not yet appear because the projection has not processed the event. Use optimistic UI (insert the item client-side) or add a brief polling loop after writes to detect when the read model has caught up.
Running the write and read models on the same PostgreSQL instance. Read query CPU and I/O still contend with write transactions — the two models are logically separated but physically co-located on the same resource they were competing for.
Design Tradeoffs
| Dimension | Separate Read/Write Models | Shared CRUD Model |
|---|---|---|
| Read performance | Pre-aggregated, denormalized: simple indexed lookups, sub-100ms regardless of data volume | Runtime joins and aggregations: performance degrades as rows accumulate |
| Write performance | Write model has minimal indexes — inserts are fast with no index fan-out to read-optimized columns | Heavy composite indexes required for reads slow down write operations |
| Consistency | Eventual: projection lag of 50ms–5s typical; must be communicated to users | Immediate: read-your-own-writes guaranteed without additional UI patterns |
| Complexity | High: event bus, projection workers, multiple databases, idempotent event handlers | Low: standard ORM and MVC patterns; one database schema to design and migrate |
Best Practices
When to Use / Avoid
| Use When | Avoid When |
|---|---|
| Read-to-write ratio is extremely high (1000:1+) and reads require complex aggregations across large datasets | System is a simple CRUD application with low traffic and queries that are fast at current data volume |
| Write model has complex business rules and validation while reads require pre-aggregated, denormalized views across different dimensions | Strict immediate consistency is required — the application cannot tolerate any projection lag between a write and its visibility in reads |
| Read and write operations have measurably different scaling requirements — different hardware, different database engines | Team lacks experience with event-driven architectures and eventual consistency — the operational overhead will dominate engineering time |