Reliability Engineering · #21

Bulkhead Pattern

Bulkheads isolate resource pools so a degraded dependency can only exhaust its own pool — a slow payment provider cannot consume the thread slots allocated to inventory lookups.

Published May 29, 2026 · By MortalApps · 5 min read · 960 words

TL;DR

Bulkheads isolate resource pools per dependency so a degraded service exhausts only its own threads or connections — not the global pool shared by all other operations.
Elasticsearch ships with named thread pools per operation type: search (~25 threads on a 16-core node), write (16), bulk (16). Heavy indexing saturates the bulk pool without touching search throughput — this is bulkheading built into the architecture.
Size thread pools using Little's Law: pool_size = throughput_RPS × P99_latency_seconds + buffer. A pool sized by intuition will be either too small (causes unnecessary rejections) or too large (context-switch overhead degrades performance).
Use bounded queues on thread pool bulkheads (10–50 tasks). Unbounded queues hide latency spikes and cause OOM under sustained load.
Semaphore bulkheads have zero thread overhead but offer no isolation — the calling thread blocks if the semaphore is acquired. Use semaphores only for fast, non-blocking operations.

Problem Idea Flow Examples Anti-patterns Tradeoffs Best Practices Related

The Problem

A service calls three external providers: payment, fraud detection, and inventory. All three share a single thread pool of 50 threads. During a Black Friday sale, the payment provider's API slows to 8-second response times due to load. 50 payment threads fill the pool. Fraud detection and inventory calls queue behind them. Within seconds, inventory lookups — completing in 20ms — are also timing out because the shared thread pool has no available slots. A non-critical payment slowdown has disabled all three capabilities through shared resource exhaustion.

Core System Idea

The Bulkhead pattern partitions system resources into isolated pools, named after the physical compartments in a ship's hull that prevent a single breach from sinking the vessel. Each downstream dependency, customer tier, or API category gets its own dedicated resource pool: thread pool, semaphore, or connection pool. Two mechanisms: (1) Thread pool bulkhead — each dependency gets a fixed pool of threads. Calls to the dependency are submitted to the pool; if the pool is full, new calls are rejected immediately with a fallback response. The calling thread is never blocked — it submits the task and returns. This is full isolation: a saturated pool cannot affect other pools. Overhead: context switching between threads (~1–5μs per switch). (2) Semaphore bulkhead — a counter limits concurrent calls to a dependency. If the count is at max, new calls return a fallback immediately. No new threads are spawned — the calling thread holds the semaphore while executing. Zero thread overhead, but no isolation: if the dependency hangs for 30 seconds, the calling thread is blocked for 30 seconds while holding the semaphore. Multi-tenant systems extend bulkheading to customer tiers: enterprise customers get dedicated pools; free-tier users share a smaller pool. One free-tier customer running an expensive query cannot degrade enterprise response times.

System Flow

flowchart TD A["Incoming Request"] --> B{"Route to Pool"} B -- "Payment API" --> C["Thread Pool A (size=20)"] B -- "Inventory API" --> D["Thread Pool B (size=15)"] C --> E{"Pool Full?"} D --> F{"Pool Full?"} E -- "Yes" --> G["Fast Reject or Fallback"] E -- "No" --> H["Execute Payment Call"] F -- "Yes" --> G F -- "No" --> I["Execute Inventory Call"]

Each dependency has its own fixed-size thread pool; a saturated payment pool rejects new calls without affecting inventory pool availability.

Real-World Examples Indicative

Elasticsearch named thread pools

Elasticsearch ships with dedicated thread pools per operation type: search (size: min(int((vCPUs * 3) / 2) + 1, 10) — ~25 on a 16-core node), write (size: vCPUs = 16), bulk (same as write), get (size: vCPUs). On a 16-core node: 25 search threads, 16 write threads, 16 bulk threads — completely separate. Heavy bulk indexing (e.g., ingesting log data) can saturate all 16 bulk threads without touching the 25 search threads. This built-in bulkheading is why Elasticsearch can maintain query SLAs even during aggressive indexing operations.

Resilience4j Thread Pool Bulkhead

A fintech payment service calls 3 external providers: Stripe, PayPal, and Klarna. Resilience4j Thread Pool Bulkhead config per provider: maxThreadPoolSize=20, coreThreadPoolSize=10, queueCapacity=10. Stripe's API degrades and 20 threads are stuck waiting for 8-second timeouts. The PayPal and Klarna pools (separate, size 20 each) are unaffected — their calls proceed in 200ms. Without bulkheads, 20 slow Stripe threads in a shared pool of 50 would reduce available threads for PayPal and Klarna to 30, degrading all three providers simultaneously.

Multi-tenant database connection isolation

A multi-tenant SaaS platform separates enterprise customers (dedicated connection pool of 20 per tenant, configured in PgBouncer) from free-tier users (shared pool of 50 across all free users). When a free-tier customer triggers a slow full-table scan and holds all 50 shared connections for 10 seconds, enterprise customers' queries execute against their dedicated pool without degradation. PgBouncer's application_name-based routing directs each connection to the appropriate pool. This is customer-tier bulkheading: the isolation axis is business tier, not just technical dependency.

Anti-Patterns

Single global thread pool

Running all async tasks on the JVM's ForkJoinPool.commonPool() or Python's default thread executor. Any slow dependency starves the entire runtime.

Oversizing pools to avoid rejection

Setting every pool to 500 threads "to be safe". At 500 threads, context switching overhead degrades overall system throughput. Size pools using Little's Law, not intuition.

Unbounded queues on bulkheads

ThreadPoolExecutor(max_workers=20, queue=Queue()) — unlimited queue depth. A degraded dependency fills the pool, tasks queue indefinitely, memory grows, OOM kills the process. Always use bounded queues.

Isolating threads but sharing connections

Bulkheading application threads per dependency while sharing a single database connection pool across all of them. The connection pool becomes the shared bottleneck that bulkheading was supposed to eliminate.

Design Tradeoffs

Dimension	Thread Pool Bulkhead	Semaphore Bulkhead
Thread isolation	Full — calling thread never blocked by dependency	None — calling thread holds semaphore while executing
Overhead	Context-switch overhead (~1–5μs per task)	Near-zero — atomic counter increment/decrement
Queuing support	Yes — bounded queue absorbs bursts	No — immediate rejection when limit reached
Best for	I/O-bound calls with variable latency (external APIs)	Fast, bounded operations (in-memory cache reads)

Best Practices

Size pools using Little's Law: pool_size = throughput_RPS × P99_latency_seconds + buffer. A recommendation service handling 100 RPS at P99 latency 200ms needs 100 × 0.2 = 20 threads at steady state; add 50% buffer = 30.

Pair every thread pool with a bounded queue (10–50 tasks). The queue absorbs transient bursts; the size cap prevents unbounded memory growth. Alert when queue utilization exceeds 80%.

Isolate by both dependency and customer tier. Enterprise SLA violations cost 10× more than free-tier violations — dedicated pools for high-value customers are worth the resource overhead.

Monitor pool saturation and rejection rate as primary SLIs. A pool at 90%+ utilization for more than 30 seconds is a leading indicator of imminent rejection storms — alert and scale before the pool fills.

Use semaphore bulkheads for fast synchronous operations (cache reads, in-memory computation). Use thread pool bulkheads for I/O-bound calls where the call duration is variable and potentially long.

When to Use / Avoid

Use When	Avoid When
Multiple external dependencies with variable response times share resources	Single-dependency service where one pool covers all calls
Multi-tenant systems must isolate per-customer resource usage	CPU-bound systems where extra thread pools add context-switch overhead
Critical paths (checkout, login) must be protected from non-critical features (recommendations)	Simple applications with in-process calls only — bulkheads add complexity with no benefit