Distributed Coordination · #65

Distributed Transactions

Two-Phase Commit (2PC) is a blocking protocol that severely limits system throughput and availability at scale.

Published May 29, 2026 · By MortalApps · 5 min read · 953 words

TL;DR

Two-Phase Commit (2PC) is a blocking protocol that severely limits system throughput and availability at scale.
Sagas replace atomic distributed transactions with a sequence of local transactions and compensating actions.
Compensating transactions must be designed to be strictly idempotent and capable of running out-of-order.
A centralized transaction log (Saga Log) is mandatory to audit, resume, and debug failed multi-step workflows.

Problem Idea Flow Examples Anti-patterns Tradeoffs Best Practices Related

The Problem

When an application transitions from a single monolithic database to a distributed microservices architecture, business processes that once ran inside a single ACID transaction are split across multiple independent databases. For example, a travel booking flow must reserve a flight, book a hotel, and charge a credit card—each managed by a different service.

If the flight reservation succeeds but the hotel booking fails, the system is left in an inconsistent, partially completed state. Attempting to solve this using traditional distributed database transactions (like Two-Phase Commit) locks database rows across multiple network hops, creating severe performance bottlenecks and cascading failures if any single node or network link experiences transient latency.

Core System Idea

To maintain consistency across microservices without sacrificing scalability, systems must abandon synchronous distributed locks and adopt the Saga Pattern.

A Saga is a design pattern that models a distributed transaction as a sequence of local transactions. Each step in the Saga updates data within its own local database and publishes an event or message to trigger the next step.

Crucially, if a step fails, the Saga must execute a series of compensating transactions in reverse order to undo the changes made by the previous steps.

Sagas can be implemented in two ways: 1. Choreography: Nodes listen to events and execute their local transactions independently without a central coordinator. 2. Orchestration: A dedicated service (the Saga Orchestrator) explicitly manages the workflow, sending commands to participant services, tracking state in a persistent Saga Log, and executing compensating actions if any step fails.

System Flow

When the Hotel Service fails, the Saga Orchestrator initiates a compensating transaction to cancel the previously reserved flight, restoring system consistency.

Real-World Examples Indicative

Uber Cadence/Temporal — trip processing saga with 500K concurrent workflows, automatic compensation on payment failure

Uber built Cadence (now open-sourced as Temporal.io) to replace ad-hoc state machines in their Go microservices. Uber's trip processing Saga runs as a Temporal workflow with 5 steps: CreateTrip, ReserveDriver, ChargePayment, NotifyRider, and StartTrip. If ChargePayment fails after a payment service timeout, Temporal automatically schedules compensation activities in reverse: CancelDriverReservation, then VoidTrip — each retried with exponential backoff (initial 1s, max 100s) up to 3 times before escalating to compensation. Temporal persists the entire workflow event history to Cassandra, meaning a Cadence worker crash at any step is transparent — the workflow resumes exactly where it left off on restart with no duplicated steps. At peak, Uber runs 500K+ concurrent Temporal workflows covering trip processing, driver onboarding, and surge pricing calculations.

Stripe — idempotency-key saga for exactly-once payment execution across 250M+ API calls/day

Stripe's payment processing uses a choreography-style Saga built around idempotency keys. When a client calls POST /v1/charges with an Idempotency-Key: idem_abc123, Stripe stores the key and its status in a Redis-backed idempotency store before executing any step. If the API server crashes after charging the card but before recording the charge in PostgreSQL, the idempotency key marks the operation as "in-flight." When the client retries, Stripe detects the in-flight key, queries the downstream card network's result (which records the charge by its own idempotency token), and replays the PostgreSQL write — guaranteeing exactly-once charge execution. Stripe processes 250M+ API calls/day relying on this idempotency saga architecture, with the Redis idempotency store holding 48-hour TTL keys across all live transactions.

Airbnb — payments reservation saga with MySQL Saga Log, 4,200 stuck sagas recovered during Stripe degradation

Airbnb's checkout flow uses an orchestrated Saga for booking reservations: (1) ReserveNight (inventory DB), (2) AuthorizePayment (Stripe), (3) ConfirmBooking (host notification). The Saga Orchestrator persists each step's state to a MySQL Saga Log table before issuing commands. During a 2019 Stripe API degradation, AuthorizePayment returned 504 timeouts. Airbnb's orchestrator detected the failure after 3 retries (30s total), executed the compensation — CancelNightReservation — and returned a booking-failed error to the user. The MySQL Saga Log allowed Airbnb's on-call engineers to query all in-progress sagas during the incident, identify the 4,200 reservations stuck in the AuthorizePayment step, and manually trigger compensations after restoring Stripe connectivity — a recovery operation that would have been impossible without the explicit Saga Log.

Anti-Patterns

Using 2PC Across Microservices

Implementing WS-AtomicTransaction or XA transactions across service boundaries, which couples the availability of the entire system to its single slowest component.

Designing Non-Idempotent Compensations

Writing compensating transactions (e.g., refundPayment) that do not handle duplicate executions safely, leading to double-refunding during network retries.

Assuming Compensations Cannot Fail

Failing to design a manual intervention path or dead-letter queue (DLQ) for scenarios where a compensating transaction itself fails repeatedly (e.g., due to a downstream API outage).

Mixing Business Logic with Orchestration

Hardcoding complex business rules directly inside the Saga Orchestrator instead of keeping the orchestrator a pure state-transition engine.

Design Tradeoffs

Dimension	Choreographed Saga (Decentralized)	Orchestrated Saga (Centralized)
Operational visibility	Low; the workflow is implicit and distributed across multiple event consumers — a single saga spans many service logs with no unified view	High; the orchestrator maintains explicit workflow state — a single Saga Log query reveals the current step of any saga instance
Service coupling	Low; services only emit events and react to events — no service needs to know about other services' internals	Moderate; participant services must implement the command interface expected by the orchestrator
Failure handling	Complex; compensating events must be implemented in every consumer, and cascading failures across services are difficult to trace	Controlled; the orchestrator manages all retry and compensation logic centrally, making failure handling explicit and auditable

Best Practices

Ensure Absolute IdempotencyEvery participant service must use unique idempotency keys to identify and discard duplicate commands sent by the orchestrator during retries.

Make Compensations CommutativeDesign compensating transactions so they can execute successfully even if they arrive out-of-order (e.g., a cancel command arriving before the reserve command due to network race conditions).

Persist the Saga LogThe orchestrator must synchronously write state transitions to a highly available database (the Saga Log) before sending commands to participants, ensuring it can resume progress after a crash.

Design for Semantic Isolation FailuresBecause Sagas lack database-level read isolation, other transactions can see intermediate, uncommitted states. Use application-level techniques like "Pending" flags to prevent users from interacting with dirty data.

When to Use / Avoid

Use When	Avoid When
You have a multi-step business process that spans multiple microservices, databases, or external third-party APIs.	The entire transaction can be executed within a single, monolithic relational database using standard ACID guarantees.
The business process is long-running (minutes, hours, or days) and cannot hold database locks open.	Low-latency, high-throughput transactional consistency is required (e.g., high-frequency trading systems).
You need to build highly auditable workflows where every state transition must be logged and visible to operators.	The system is simple, and eventual consistency can be handled by simple client-side retries without complex rollback logic.