List Comprehensions vs Generator Expressions

Published May 26, 2026 · By MortalApps · ·

Mental Model

Imagine a generator as a conveyor belt that delivers items one by one. Once an item is delivered, it's gone from the belt. If you want to process it again, you need a new conveyor belt.

Rule: Never perform multiple passes or conditional checks on a generator expression without realizing that iteration permanently consumes its state.

The Setup

You are processing a large CSV database dump. To keep memory utilization low on your worker node, you choose a generator expression over a list comprehension, but then pass the output to a validator that checks it multiple times, leading to zero-length collections and empty results downstream.

What Does This Print?

⚠ Broken code

Python

def process_large_dataset(numbers):
    # Attempt to save memory with a generator expression
    squares = (x * x for x in numbers)
    
    # Check if we have processed anything, then return the values
    if not any(squares):
        return "No valid positive squares found."
    
    # Retrieve the actual items for further execution
    return list(squares)

data = [1, 2, 3, 4]
print(process_large_dataset(data))

Predict what the function returns when executed with the list [1, 2, 3, 4]. Will it return the list of squared values?

The Output

What actually happens

[]

The code returns an empty list [] instead of the expected squares. This happens because generators are single-pass, stateful iterators. When any(squares) executes, it evaluates the generator until it finds a truthy value (which is 1 * 1 = 1). In doing so, it advances the generator's internal pointer and consumes the first element. When list(squares) is called later to retrieve the rest, the generator is already partially exhausted. In this specific case, since the first element satisfied any(), the remaining elements are still there but if the condition checked further elements, you would experience silent data loss.

Why Python Does This

CPython implements generator expressions via code objects (<genexpr>) that maintain a stateful stack frame (frame object) containing the instruction pointer (f_lasti) and evaluation stack. When next() is called (either explicitly or via implicit iteration in any()), CPython executes bytecode until it hits a YIELD_VALUE instruction. Once yielded, the frame state is preserved. If another consumer attempts to iterate over the generator, execution resumes from the last saved state. There is no mechanism to rewind or clone this state. Converting the generator to a list or checking it with boolean operations consumes the generator's state permanently.

The Fix

✓ Corrected pattern

Python

def process_large_dataset(numbers):
    # If we need to perform multiple passes, we cannot use a raw generator.
    # We must evaluate the generator into a collection exactly once.
    squares = [x * x for x in numbers] # Using list comprehension for multi-pass capability
    
    # Now we can safely perform multiple checks over the list
    if not any(squares):
        return "No valid positive squares found."
    
    return squares # Already a list, safe to return and reuse

The fix would ensure that if a generator needs to be iterated multiple times or partially consumed then fully processed, a fresh generator is created for each distinct pass, or the first pass stores its results. This ensures that the state is not prematurely consumed for subsequent operations.

How This Fails in Real Systems

A financial transaction microservice used generator expressions to stream ledger records to validation and logging pipelines. Because validation used an implicit boolean check like if any(tx.is_risk for tx in transactions), the downstream ledger writer received an empty generator, skipping database writes entirely. This went undetected for 48 hours until accounting flagged a zero-balance anomaly in the daily reconciliation report.

Key Takeaway

Never perform multiple passes or conditional checks on a generator expression without realizing that iteration permanently consumes its state.

Common mistake: Developers often treat generator expressions like lists that can be iterated over multiple times, forgetting their stateful, single-pass nature.