← Python Code Memory & Data Structures
Browse Python Concepts

Lists, Tuples, Sets, Dicts — Internal Implementation and Big O

Mental Model

Imagine searching for a book in a library. With a list, you have to walk down every aisle and check every shelf (linear scan). With a set or dict, it's like having a perfect index or a magically organized system where you instantly know if the book is there.

Rule: When verifying container membership on larger data sequences, always use sets or dictionaries to guarantee O(1) search performance.

The Setup

You are designing a real-time event validator filtering out blocked IP addresses from incoming server logs. You load a large blacklist of IPs into a collection, checking each arriving address against this list.

What Does This Print?

Broken code
Python
import time

# Simulating 50,000 blacklisted IP addresses
blacklist = [f"192.168.1.{i}" for i in range(50000)]

# Simulated stream of incoming requests
incoming_ips = [f"192.168.1.{i}" for i in range(49900, 50100)]

start = time.perf_counter()
blocked_count = 0
for ip in incoming_ips:
    if ip in blacklist: # Membership check
        blocked_count += 1
duration = time.perf_counter() - start
print(f"Time taken to check: {duration:.5f}s")
Predict how the selection of a Python list impacts lookup performance compared to using a set collection.

The Output

What actually happens
Time taken to check: 0.12512s

While 0.12 seconds seems minor, this block is processing only 200 checks. If processing 200,000 logs per second, this linear scan would bring down the entire processing pipeline, causing a major performance bottleneck.

Why Python Does This

A Python list is a dynamic array (a contiguous sequence of pointers). Membership checking (item in list) requires sequentially inspecting every item from index 0 up to N—an O(N) operation. Conversely, sets and dictionaries are hash tables in CPython. Checking membership (item in set) computes the object's hash value, maps it to a bucket index, and directly retrieves it, resulting in average-case O(1) constant-time lookup.

The Fix

Corrected pattern
Python
import time

# Convert to a set to utilize hashing internals for O(1) lookups
blacklist = {f"192.168.1.{i}" for i in range(50000)}

incoming_ips = [f"192.168.1.{i}" for i in range(49900, 50100)]

start = time.perf_counter()
blocked_count = 0
for ip in incoming_ips:
    if ip in blacklist: # O(1) hash table lookup
        blocked_count += 1
duration = time.perf_counter() - start
print(f"Time taken with set: {duration:.5f}s")

Converting the blacklist to a set (or dict keys) allows membership checks to leverage hash tables. Hash tables provide average-case O(1) lookup time because the hash of the item directly points to its location, avoiding a linear scan.

How This Fails in Real Systems

An authentication gateway checked valid JWT claims against a revocation list stored as a Python list loaded from Redis. As revoked tokens grew to 100,000, login latency spiked from 30ms to 950ms, causing the gateway's horizontal pod autoscalers to trigger excessively.

Key Takeaway

When verifying container membership on larger data sequences, always use sets or dictionaries to guarantee O(1) search performance.
Common mistake: Developers use list for membership testing in performance-critical loops without considering the linear time complexity (O(N)) of scanning through a large list.