Thread Safety — Why += on a Shared List Is Not Atomic
Imagine multiple people trying to update a single score on a scoreboard simultaneously without coordination. One person reads '5', another reads '5', both add '1' to get '6', and then both write '6' back. The score should be '7', but it's only '6'. Locks are like a single pen that only one person can use at a time to write on the scoreboard.
The Setup
You are building a high-throughput metrics collector that increments a global counter inside several concurrent background worker threads. You assume that because Python has a GIL, simple mathematical operations on basic integers or lists are inherently safe from race conditions.
What Does This Print?
import threading
counter = 0
def increment():
global counter
for _ in range(100000):
counter += 1
threads = [threading.Thread(target=increment) for _ in range(2)]
for t in threads: t.start()
for t in threads: t.join()
print(f"Final counter value: {counter}")
The Output
The final counter value is significantly lower than the expected 200000. This occurs because the += operation is not atomic, meaning it gets split into multiple independent assembly steps where context switches can happen.
Why Python Does This
Although CPython's GIL prevents multiple threads from executing Python bytecode simultaneously, it does not prevent the thread scheduler from switching threads mid-operation. The code counter += 1 translates to several bytecode instructions: LOAD_GLOBAL, LOAD_CONST, INPLACE_ADD, and STORE_GLOBAL. If the operating system switches threads after a thread has loaded the counter value but before it writes the updated value back, another thread can overwrite its changes, causing updates to be dropped.
The Fix
import threading
counter = 0
counter_lock = threading.Lock()
def increment():
global counter
for _ in range(100000):
# Protect non-atomic operations using a mutual exclusion lock
with counter_lock:
counter += 1
Using a 'threading.Lock' ensures that only one thread can execute the critical section of code (e.g., 'counter += 1') at any given time. This serializes access to the shared mutable state, preventing race conditions where multiple threads might simultaneously read, modify, and write, leading to data corruption.
How This Fails in Real Systems
A distributed task worker tracked total active workloads using a global shared dictionary counter. Because of race conditions during traffic spikes, the state fell out of sync, leading to tasks being scheduled infinitely to nodes that were flagged as 'idle' but were actually overloaded.