Event-Driven Architecture
Events decouple who does something from who needs to know about it — the producer never waits for consumers, and adding a new consumer requires zero changes to the producer.
- Events decouple who does something from who needs to know about it — the producer never waits for consumers, and adding a new consumer requires zero changes to the producer.
- Event schema breaking changes are silent production failures: consumers crash or silently corrupt data until someone notices. Use a schema registry (Confluent, AWS Glue) and enforce compatibility at publish time.
- Strict ordering across partitions is impossible without a single partition — and a single partition is a throughput ceiling. Design consumers to be order-tolerant where possible.
- At-least-once delivery is the default for Kafka, SQS, and most brokers. Your consumers must be idempotent — receiving the same event twice must produce the same result.
- Consumer lag is your primary operational signal: growing lag means consumers can't keep up, and unbounded lag means eventual data loss when broker retention expires.
The Problem
A payment service calls the notification service synchronously after processing a transaction. The notification service is slow during a marketing campaign — every payment now takes 800ms instead of 50ms. Then the notification service goes down, taking payments down with it. These two services have no business being coupled: a payment completing is a fact; who gets told about it and when is a separate concern. Synchronous coupling transforms one service's availability into another's dependency — and at scale, every tight coupling is a future incident.
Core System Idea
Event-Driven Architecture (EDA) replaces synchronous service-to-service calls with immutable events published to a durable broker. A producer emits a fact (OrderPlaced, PaymentProcessed) to a named topic in Kafka, SQS, or Pub/Sub — it does not know who consumes it or when. Consumers subscribe independently, processing events at their own pace. The broker provides durability (events survive consumer restarts), fan-out (one event reaches N consumers), and replay (consumers can reprocess historical events). Key operational properties: producers are never blocked by consumer slowness; consumers fail without affecting producers; new consumers can be added without producer changes; events can be replayed to rebuild read models or recover from bugs. Kafka retains events for days or weeks; SQS retains for up to 14 days.
System Flow
Producer publishes once; broker fans out to all subscribers independently. Failed consumer messages route to DLQ.
Real-World Examples Indicative
Every state transition in a trip (RideRequested, DriverAccepted, PickupConfirmed, TripCompleted) is published to Kafka. The dispatch service, payment service, ETA service, and surge pricing service each consume independently. When Uber processes 15M+ trips per day, this fan-out architecture means adding a new feature (e.g., a carbon offset service) requires zero changes to the trip service — just a new consumer subscribing to TripCompleted.
Every profile view, connection request, and post like is an event. The feed aggregation service, notification service, and analytics service all consume the same event streams independently. LinkedIn processes over 5 trillion events per day through Kafka — a volume that would be impossible with synchronous RPC between services. Consumer lag on the feed aggregation topic is a primary SLO metric; sustained lag above 30 seconds triggers paging.
An OrderPlaced event fans out to inventory reservation, fraud detection, payment authorization, fulfillment, and customer notification consumers. Each runs independently — if fraud detection is slow, it doesn't block payment or fulfillment. Events are stored in Kafka with 7-day retention, enabling replay when a downstream service has a bug that corrupts data: fix the bug, reset the consumer offset, and reprocess.
Anti-Patterns
Expecting a consumer to publish a response event that the producer then waits for. This is synchronous coupling with extra steps — you get all the latency and failure coupling of RPC plus the complexity of async messaging.
Adding a required field to an event schema and deploying producers before consumers are updated. Consumers crash or silently ignore new events until updated. Use Confluent Schema Registry with BACKWARD compatibility mode — new schemas must be readable by old consumers.
Kafka guarantees order within a partition, not across partitions. Routing the same entity's events to different partitions breaks ordering. Design consumers to handle out-of-order events using event timestamps, not arrival order.
Embedding the entire order object in every OrderUpdated event. When the order schema changes, all consumers need updates. Prefer thin events (entity ID + changed fields) or at least version the payload explicitly.
A consumer that falls 10 minutes behind is indistinguishable from a healthy consumer unless you're tracking lag. When Kafka retention expires and the consumer hasn't caught up, events are permanently lost.
Design Tradeoffs
| Dimension | Fat Events | Thin Events |
|---|---|---|
| Payload | Full object snapshot | Entity ID + delta only |
| Consumer queries needed | None (self-contained) | Yes (fetch current state) |
| Schema coupling | High (producer schema embedded) | Low (stable ID contract) |
| Best for | Read-heavy, many consumers | Write-heavy, schema-volatile producers |
Best Practices
BACKWARD or FULL compatibility on every schema change. Reject breaking changes at CI time, not at consumer runtime.When to Use / Avoid
| Use When | Avoid When |
|---|---|
| 1 event must reach N independent consumers | Immediate synchronous response is required |
| Services must scale and deploy independently | Strong cross-service transactional consistency is needed |
| Producer availability must not depend on consumer availability | Team lacks experience debugging distributed, async flows |
| Event replay is needed for recovery or new feature backfill | Simple point-to-point call between 2 services suffices |