Distributed Transactions
Two-Phase Commit (2PC) is a blocking protocol that severely limits system throughput and availability at scale.
- Two-Phase Commit (2PC) is a blocking protocol that severely limits system throughput and availability at scale.
- Sagas replace atomic distributed transactions with a sequence of local transactions and compensating actions.
- Compensating transactions must be designed to be strictly idempotent and capable of running out-of-order.
- A centralized transaction log (Saga Log) is mandatory to audit, resume, and debug failed multi-step workflows.
The Problem
When an application transitions from a single monolithic database to a distributed microservices architecture, business processes that once ran inside a single ACID transaction are split across multiple independent databases. For example, a travel booking flow must reserve a flight, book a hotel, and charge a credit card—each managed by a different service.
If the flight reservation succeeds but the hotel booking fails, the system is left in an inconsistent, partially completed state. Attempting to solve this using traditional distributed database transactions (like Two-Phase Commit) locks database rows across multiple network hops, creating severe performance bottlenecks and cascading failures if any single node or network link experiences transient latency.
Core System Idea
To maintain consistency across microservices without sacrificing scalability, systems must abandon synchronous distributed locks and adopt the Saga Pattern.
A Saga is a design pattern that models a distributed transaction as a sequence of local transactions. Each step in the Saga updates data within its own local database and publishes an event or message to trigger the next step.
Crucially, if a step fails, the Saga must execute a series of compensating transactions in reverse order to undo the changes made by the previous steps.
Sagas can be implemented in two ways: 1. Choreography: Nodes listen to events and execute their local transactions independently without a central coordinator. 2. Orchestration: A dedicated service (the Saga Orchestrator) explicitly manages the workflow, sending commands to participant services, tracking state in a persistent Saga Log, and executing compensating actions if any step fails.
System Flow
When the Hotel Service fails, the Saga Orchestrator initiates a compensating transaction to cancel the previously reserved flight, restoring system consistency.
Real-World Examples Indicative
Uber built Cadence (now open-sourced as Temporal.io) to replace ad-hoc state machines in their Go microservices. Uber's trip processing Saga runs as a Temporal workflow with 5 steps: CreateTrip, ReserveDriver, ChargePayment, NotifyRider, and StartTrip. If ChargePayment fails after a payment service timeout, Temporal automatically schedules compensation activities in reverse: CancelDriverReservation, then VoidTrip — each retried with exponential backoff (initial 1s, max 100s) up to 3 times before escalating to compensation. Temporal persists the entire workflow event history to Cassandra, meaning a Cadence worker crash at any step is transparent — the workflow resumes exactly where it left off on restart with no duplicated steps. At peak, Uber runs 500K+ concurrent Temporal workflows covering trip processing, driver onboarding, and surge pricing calculations.
Stripe's payment processing uses a choreography-style Saga built around idempotency keys. When a client calls POST /v1/charges with an Idempotency-Key: idem_abc123, Stripe stores the key and its status in a Redis-backed idempotency store before executing any step. If the API server crashes after charging the card but before recording the charge in PostgreSQL, the idempotency key marks the operation as "in-flight." When the client retries, Stripe detects the in-flight key, queries the downstream card network's result (which records the charge by its own idempotency token), and replays the PostgreSQL write — guaranteeing exactly-once charge execution. Stripe processes 250M+ API calls/day relying on this idempotency saga architecture, with the Redis idempotency store holding 48-hour TTL keys across all live transactions.
Airbnb's checkout flow uses an orchestrated Saga for booking reservations: (1) ReserveNight (inventory DB), (2) AuthorizePayment (Stripe), (3) ConfirmBooking (host notification). The Saga Orchestrator persists each step's state to a MySQL Saga Log table before issuing commands. During a 2019 Stripe API degradation, AuthorizePayment returned 504 timeouts. Airbnb's orchestrator detected the failure after 3 retries (30s total), executed the compensation — CancelNightReservation — and returned a booking-failed error to the user. The MySQL Saga Log allowed Airbnb's on-call engineers to query all in-progress sagas during the incident, identify the 4,200 reservations stuck in the AuthorizePayment step, and manually trigger compensations after restoring Stripe connectivity — a recovery operation that would have been impossible without the explicit Saga Log.
Anti-Patterns
Implementing WS-AtomicTransaction or XA transactions across service boundaries, which couples the availability of the entire system to its single slowest component.
Writing compensating transactions (e.g., refundPayment) that do not handle duplicate executions safely, leading to double-refunding during network retries.
Failing to design a manual intervention path or dead-letter queue (DLQ) for scenarios where a compensating transaction itself fails repeatedly (e.g., due to a downstream API outage).
Hardcoding complex business rules directly inside the Saga Orchestrator instead of keeping the orchestrator a pure state-transition engine.
Design Tradeoffs
| Dimension | Choreographed Saga (Decentralized) | Orchestrated Saga (Centralized) |
|---|---|---|
| Operational visibility | Low; the workflow is implicit and distributed across multiple event consumers — a single saga spans many service logs with no unified view | High; the orchestrator maintains explicit workflow state — a single Saga Log query reveals the current step of any saga instance |
| Service coupling | Low; services only emit events and react to events — no service needs to know about other services' internals | Moderate; participant services must implement the command interface expected by the orchestrator |
| Failure handling | Complex; compensating events must be implemented in every consumer, and cascading failures across services are difficult to trace | Controlled; the orchestrator manages all retry and compensation logic centrally, making failure handling explicit and auditable |
Best Practices
When to Use / Avoid
| Use When | Avoid When |
|---|---|
| You have a multi-step business process that spans multiple microservices, databases, or external third-party APIs. | The entire transaction can be executed within a single, monolithic relational database using standard ACID guarantees. |
| The business process is long-running (minutes, hours, or days) and cannot hold database locks open. | Low-latency, high-throughput transactional consistency is required (e.g., high-frequency trading systems). |
| You need to build highly auditable workflows where every state transition must be logged and visible to operators. | The system is simple, and eventual consistency can be handled by simple client-side retries without complex rollback logic. |