yield and Generators — Lazy Evaluation in Python
Think of calling a generator function like ordering a custom-made meal. You place the order, and the chef (the generator function) acknowledges it by handing you a ticket (the generator object). The actual cooking (the code inside the generator) only begins when you try to eat the meal (iterate the generator).
The Setup
You write a log-parsing service that opens a large file, filters lines with a generator, and measures execution times. You expect the file reading and validation exceptions to raise immediately when the generator function is called.
What Does This Print?
import os
def log_reader(filepath):
if not os.path.exists(filepath):
raise FileNotFoundError(f"Missing log file: {filepath}")
with open(filepath, 'r') as f:
for line in f:
yield line.strip()
# We pass a non-existent file path
try:
stream = log_reader("non_existent_system_log.log")
print("Generator object successfully created.")
except FileNotFoundError:
print("Caught file not found error immediately!")
The Output
The code prints "Generator object successfully created." and does not catch the exception. This is because calling a generator function does not execute any code inside the function body. Instead, it only returns a generator object. The code inside log_reader, including the file existence check and the FileNotFoundError raise, will not execute until the generator is iterated over using next() or a loop.
Why Python Does This
In CPython, a function definition containing the yield keyword is compiled with a specific flag in its code object: CO_GENERATOR. When called, the interpreter checks for this flag and bypasses standard frame execution. Instead of executing the bytecode, it instantiates a PyGenObject which wraps the frame, arguments, and local variables, then returns it. The frame remains suspended in memory at bytecode offset 0. Execution only begins when the frame's generator is advanced via tp_iternext (triggered by next()), making error-checking lazy and temporally uncoupled from instantiation.
The Fix
import os
def log_reader(filepath):
# Eagerly validate prerequisites before entering the generator frame
if not os.path.exists(filepath):
raise FileNotFoundError(f"Missing log file: {filepath}")
# Inner generator function to stream values
def _generator():
with open(filepath, 'r') as f:
for line in f:
yield line.strip()
return _generator() # Return the generator after eager checks pass
Splitting the generator function into an eager setup phase and a lazy yield phase ensures that critical preconditions, like file existence, are verified at the moment the generator object is created, allowing immediate feedback on invalid inputs, rather than deferring the error until the first iteration.
How This Fails in Real Systems
An asynchronous backup worker initialized file generators for critical configuration directories. Because exceptions were deferred until downstream iteration, the microservice started successfully, but silently failed during scheduled backup execution hours later when the missing directories caused uncaught runtime exceptions.