RAG — The Full Pipeline: Chunk, Embed, Store, Retrieve, Prompt
A Python generator is like a single-use ticket to a concert. Once you've entered the venue (iterated through it), the ticket is consumed. You can't use the same ticket to re-enter, nor can anyone else. If you want to go through the process again or count attendees first, you need a new ticket (a new generator) or to save the attendees (materialize to a list).
The Setup
You are implementing a document ingestion pipeline for a RAG system. You write a generator to read documents, yield chunks, count the chunks for logging, and then pass the iterator to your embedding and database storage function.
What Does This Print?
from typing import Generator
def chunk_document(text: str) -> Generator[str, None, None]:
# Simulating standard sliding-window text chunking
for segment in text.split(". "):
if segment:
yield segment.strip()
def index_documents(text: str):
chunks = chunk_document(text)
# Count chunks to log ingestion progress
chunk_count = sum(1 for _ in chunks)
print(f"Processing {chunk_count} chunks...")
# Generate embeddings and save to vector store
for index, chunk in enumerate(chunks):
print(f"Embedding chunk {index}: {chunk}") # Simulating embedding creation
index_documents("Enterprise security policies. Access control guidelines. Network architecture document.")
index_documents function.
The Output
The execution prints the initial count of 3 chunks, but then halts silently without embedding or indexing any data. The generator returned by chunk_document() was completely exhausted by the sum() function. When the second loop tries to iterate over chunks, the generator is already empty and immediately raises StopIteration, leading to no records being written to your vector database.
Why Python Does This
In Python, a generator is a single-use iterator. Its execution state is tracked by a frame object (gi_frame in CPython). When a generator is iterated, Python executes its bytecode until it hits a yield statement, pausing and returning the value. Once the bytecode execution reaches the end of the function or returns, the generator is marked as exhausted (gi_frame becomes None). Because generators do not cache values in memory, attempting to iterate over them a second time yield nothing. The sum() call consumes every element in the generator to compute the count. The subsequent for loop queries the iterator, which instantly returns StopIteration. To prevent this, generators must be converted to a list first if reusable iteration is required, or the pipeline must be structured to process and log elements in a single stream.
The Fix
def index_documents(text: str):
# Materialize the generator to a list to allow multiple traversals
chunks = list(chunk_document(text))
chunk_count = len(chunks)
print(f"Processing {chunk_count} chunks...")
# Now chunks can be iterated safely without exhaustion
for index, chunk in enumerate(chunks):
print(f"Embedding chunk {index}: {chunk}")
By converting the generator's output into a list (e.g., chunks = list(chunk_document(text))), the chunks are materialized into memory. This allows for multiple iterations over the chunks data without exhausting the source, enabling both the count and subsequent processing loops to operate on the full data.
How This Fails in Real Systems
An automated knowledge-base sync service ran a daily cron job to update vector embeddings. Because of a generator exhaustion bug in the logging system, the script ran with exit code 0 every day but silently uploaded empty document lists to Pinecone, destroying the application's search recall over a two-week period.