The GIL — What It Blocks and What It Doesn't

Published May 26, 2026 · By MortalApps · ·

Mental Model

Imagine the GIL as a single, exclusive "speaking stick" for Python bytecode execution. Even if you have multiple threads, only one thread can hold the stick and "speak" (execute Python bytecode) at any given moment, making CPU-bound tasks effectively sequential.

Rule: Never use CPU-bound threads expecting parallelism — the GIL serialises them; use multiprocessing or async I/O instead.

The Setup

A developer builds a multi-threaded web scraper that processes high-resolution image thumbnails in background threads. They find that adding more threads degrades performance instead of improving it.

What Does This Print?

⚠ Broken code

Python

import time
import threading

def compute_factorial(n):
    result = 1
    for i in range(1, n + 1):
        result *= i
    return result

start_time = time.perf_counter()
threads = []
for _ in range(4):
    t = threading.Thread(target=compute_factorial, args=(100000,))
    threads.append(t)
    t.start()

for t in threads:
    t.join()

print(f"Time taken: {time.perf_counter() - start_time:.2f} seconds")

Predict whether running these CPU-heavy computational tasks in four threads on a 4-core processor executes in parallel or runs sequentially.

The Output

What actually happens

Time taken: 1.25 seconds (Comparable to running them sequentially in a single thread)

The code does not execute in parallel across your CPU cores. Despite spawning four OS threads on a multi-core machine, the CPU usage is capped to a single core's capacity. Python's GIL prevents multiple OS threads from executing CPython bytecode simultaneously. The operating system continuously schedules and deschedules these threads, introducing context-switching overhead that can make the multi-threaded execution even slower than running the computations sequentially in a single thread.

Why Python Does This

CPython's memory management is not thread-safe. Reference counting, which CPython uses to track object lifespans, is prone to race conditions if multiple threads modify reference counts at the same time. To prevent race conditions, memory leaks, and premature deallocations, the Global Interpreter Lock (GIL) was introduced. The GIL is a mutual exclusion lock that a thread must acquire before it can execute CPython bytecode. For CPU-bound tasks, CPython releases the GIL every 5 milliseconds (configured by sys.getswitchinterval()), allowing other threads a chance to run. However, only one thread holds the lock and runs bytecode at any given instant, converting parallel execution into sequential time-slicing.

The Fix

✓ Corrected pattern

Python

import time
from concurrent.futures import ProcessPoolExecutor

def compute_factorial(n):
    result = 1
    for i in range(1, n + 1):
        result *= i
    return result

if __name__ == "__main__":
    start_time = time.perf_counter()
    # Use ProcessPoolExecutor to bypass the GIL by spawning separate OS processes
    with ProcessPoolExecutor(max_workers=4) as executor:
        futures = [executor.submit(compute_factorial, 100000) for _ in range(4)]
        results = [f.result() for f in futures]

    print(f"Time taken: {time.perf_counter() - start_time:.2f} seconds")

To achieve true CPU parallelism, you must use multiprocessing. Each process gets its own Python interpreter and memory space, completely bypassing the GIL and allowing multiple CPU-bound tasks to run simultaneously on different cores.

How This Fails in Real Systems

An API service designed to dynamically generate PDF invoices used standard Python threading to scale. Under load, response times spiked from 200ms to over 8 seconds. Profiling showed that CPU-bound PDF rendering held the GIL, completely blocking the event loop and all other incoming I/O threads. The team identified the bottleneck during a traffic spike on Black Friday and resolved it within two hours by shifting PDF generation to a Celery task queue running on separate worker processes.

Key Takeaway

Never use CPU-bound threads expecting parallelism — the GIL serialises them; use multiprocessing or async I/O instead.

Common mistake: Developers assume that Python's threading module allows true parallel execution across multiple CPU cores for CPU-bound tasks, similar to threading in languages without a Global Interpreter Lock.