RAG — The Full Pipeline: Chunk, Embed, Store, Retrieve, Prompt

Published May 26, 2026 · By MortalApps · ·

Mental Model

A Python generator is like a single-use ticket to a concert. Once you've entered the venue (iterated through it), the ticket is consumed. You can't use the same ticket to re-enter, nor can anyone else. If you want to go through the process again or count attendees first, you need a new ticket (a new generator) or to save the attendees (materialize to a list).

Rule: When working with generator-based pipelines, never double-iterate; materialize to a list if you need to calculate length or run multiple iterations.

The Setup

You are implementing a document ingestion pipeline for a RAG system. You write a generator to read documents, yield chunks, count the chunks for logging, and then pass the iterator to your embedding and database storage function.

What Does This Print?

⚠ Broken code

Python

from typing import Generator

def chunk_document(text: str) -> Generator[str, None, None]:
    # Simulating standard sliding-window text chunking
    for segment in text.split(". "):
        if segment:
            yield segment.strip()

def index_documents(text: str):
    chunks = chunk_document(text)
    
    # Count chunks to log ingestion progress
    chunk_count = sum(1 for _ in chunks)
    print(f"Processing {chunk_count} chunks...")
    
    # Generate embeddings and save to vector store
    for index, chunk in enumerate(chunks):
        print(f"Embedding chunk {index}: {chunk}") # Simulating embedding creation

index_documents("Enterprise security policies. Access control guidelines. Network architecture document.")

Predict how many chunks are successfully printed and embedded by the index_documents function.

The Output

What actually happens

Processing 3 chunks...

The execution prints the initial count of 3 chunks, but then halts silently without embedding or indexing any data. The generator returned by chunk_document() was completely exhausted by the sum() function. When the second loop tries to iterate over chunks, the generator is already empty and immediately raises StopIteration, leading to no records being written to your vector database.

Why Python Does This

In Python, a generator is a single-use iterator. Its execution state is tracked by a frame object (gi_frame in CPython). When a generator is iterated, Python executes its bytecode until it hits a yield statement, pausing and returning the value. Once the bytecode execution reaches the end of the function or returns, the generator is marked as exhausted (gi_frame becomes None). Because generators do not cache values in memory, attempting to iterate over them a second time yield nothing. The sum() call consumes every element in the generator to compute the count. The subsequent for loop queries the iterator, which instantly returns StopIteration. To prevent this, generators must be converted to a list first if reusable iteration is required, or the pipeline must be structured to process and log elements in a single stream.

The Fix

✓ Corrected pattern

Python

def index_documents(text: str):
    # Materialize the generator to a list to allow multiple traversals
    chunks = list(chunk_document(text))
    
    chunk_count = len(chunks)
    print(f"Processing {chunk_count} chunks...")
    
    # Now chunks can be iterated safely without exhaustion
    for index, chunk in enumerate(chunks):
        print(f"Embedding chunk {index}: {chunk}")

By converting the generator's output into a list (e.g., chunks = list(chunk_document(text))), the chunks are materialized into memory. This allows for multiple iterations over the chunks data without exhausting the source, enabling both the count and subsequent processing loops to operate on the full data.

How This Fails in Real Systems

An automated knowledge-base sync service ran a daily cron job to update vector embeddings. Because of a generator exhaustion bug in the logging system, the script ran with exit code 0 every day but silently uploaded empty document lists to Pinecone, destroying the application's search recall over a two-week period.

Key Takeaway

When working with generator-based pipelines, never double-iterate; materialize to a list if you need to calculate length or run multiple iterations.

Common mistake: Developers often iterate over a generator multiple times, unaware that a generator is a one-time use iterator that becomes exhausted after its first complete traversal.