The GIL — What It Blocks and What It Doesn't
Imagine the GIL as a single, exclusive "speaking stick" for Python bytecode execution. Even if you have multiple threads, only one thread can hold the stick and "speak" (execute Python bytecode) at any given moment, making CPU-bound tasks effectively sequential.
The Setup
A developer builds a multi-threaded web scraper that processes high-resolution image thumbnails in background threads. They find that adding more threads degrades performance instead of improving it.
What Does This Print?
import time
import threading
def compute_factorial(n):
result = 1
for i in range(1, n + 1):
result *= i
return result
start_time = time.perf_counter()
threads = []
for _ in range(4):
t = threading.Thread(target=compute_factorial, args=(100000,))
threads.append(t)
t.start()
for t in threads:
t.join()
print(f"Time taken: {time.perf_counter() - start_time:.2f} seconds")
The Output
The code does not execute in parallel across your CPU cores. Despite spawning four OS threads on a multi-core machine, the CPU usage is capped to a single core's capacity. Python's GIL prevents multiple OS threads from executing CPython bytecode simultaneously. The operating system continuously schedules and deschedules these threads, introducing context-switching overhead that can make the multi-threaded execution even slower than running the computations sequentially in a single thread.
Why Python Does This
CPython's memory management is not thread-safe. Reference counting, which CPython uses to track object lifespans, is prone to race conditions if multiple threads modify reference counts at the same time. To prevent race conditions, memory leaks, and premature deallocations, the Global Interpreter Lock (GIL) was introduced. The GIL is a mutual exclusion lock that a thread must acquire before it can execute CPython bytecode. For CPU-bound tasks, CPython releases the GIL every 5 milliseconds (configured by sys.getswitchinterval()), allowing other threads a chance to run. However, only one thread holds the lock and runs bytecode at any given instant, converting parallel execution into sequential time-slicing.
The Fix
import time
from concurrent.futures import ProcessPoolExecutor
def compute_factorial(n):
result = 1
for i in range(1, n + 1):
result *= i
return result
if __name__ == "__main__":
start_time = time.perf_counter()
# Use ProcessPoolExecutor to bypass the GIL by spawning separate OS processes
with ProcessPoolExecutor(max_workers=4) as executor:
futures = [executor.submit(compute_factorial, 100000) for _ in range(4)]
results = [f.result() for f in futures]
print(f"Time taken: {time.perf_counter() - start_time:.2f} seconds")
To achieve true CPU parallelism, you must use multiprocessing. Each process gets its own Python interpreter and memory space, completely bypassing the GIL and allowing multiple CPU-bound tasks to run simultaneously on different cores.
How This Fails in Real Systems
An API service designed to dynamically generate PDF invoices used standard Python threading to scale. Under load, response times spiked from 200ms to over 8 seconds. Profiling showed that CPU-bound PDF rendering held the GIL, completely blocking the event loop and all other incoming I/O threads. The team identified the bottleneck during a traffic spike on Black Friday and resolved it within two hours by shifting PDF generation to a Celery task queue running on separate worker processes.
Key Takeaway
threading module allows true parallel execution across multiple CPU cores for CPU-bound tasks, similar to threading in languages without a Global Interpreter Lock.