Streaming LLM Responses — async for chunk Patterns
The asyncio event loop is like a single, hyper-efficient chef trying to juggle many tasks. If you ask the chef to chop vegetables for 10 minutes (a blocking synchronous I/O task), they can't do anything else during that time, even if other tasks are ready. To keep things moving, the chef needs to delegate the chopping to an assistant (a thread pool) and just check back when it's done.
async for streaming loop; delegate them to thread pools.The Setup
You are building a low-latency chat interface. To make the UI highly responsive, you use asynchronous streaming to yield tokens as they arrive from the LLM. Inside the generator loop, you record a diagnostic log to a local file using basic Python I/O operations.
What Does This Print?
import asyncio
import time
async def mock_llm_stream():
# Simulating LLM yielding word chunks
for chunk in ["AI ", "agents ", "are ", "active."]:
await asyncio.sleep(0.05)
yield chunk
async def stream_and_log_response():
async for token in mock_llm_stream():
# Standard blocking file write inside the hot event loop
with open("session_log.txt", "a") as log_file:
log_file.write(token)
# Simulating OS disk write overhead
time.sleep(0.1)
async def main():
# Running the streaming task alongside another task
start = time.time()
await asyncio.gather(stream_and_log_response(), asyncio.sleep(0.1))
print(f"Completed in {time.time() - start:.2f}s")
asyncio.run(main())
The Output
The execution took over 0.6 seconds. Despite being an 'async' stream, the auxiliary asyncio.sleep(0.1) task was starved of execution CPU cycles. The synchronous, blocking calls to time.sleep and open().write() suspended the single thread of the asyncio event loop, causing all other concurrently scheduled tasks to freeze entirely until the synchronous operation finished.
Why Python Does This
The Python interpreter executes in a single-threaded event loop under asyncio. When you write async for, it allows the generator to yield execution control back to the loop on an await command. However, if the code block inside that loop executes synchronous, non-awaitable blocks (like standard file descriptors, time.sleep(), or blocking HTTP requests), Python cannot yield control. The active thread blocks on the OS-level system calls. Since the event loop cannot run its tick evaluation logic, other pending tasks are entirely starved of execution. To preserve high concurrency, any I/O operation inside a hot async stream must be fully non-blocking, handled via thread pools using run_in_executor, or delegated to async-compatible I/O libraries like aiofiles.
The Fix
import asyncio
import time
import anyio
async def mock_llm_stream():
for chunk in ["AI ", "agents ", "are ", "active."]:
await asyncio.sleep(0.05)
yield chunk
def write_log(token: str):
with open("session_log.txt", "a") as f:
f.write(token)
async def stream_and_log_response_safe():
async for token in mock_llm_stream():
# Delegate blocking I/O to a thread pool, freeing the event loop
await anyio.to_thread.run_sync(write_log, token)
async def main():
start = time.time()
await asyncio.gather(stream_and_log_response_safe(), asyncio.sleep(0.1))
print(f"Completed in {time.time() - start:.2f}s")
asyncio.run(main())
By offloading synchronous I/O operations (like file writes) to a separate thread pool using loop.run_in_executor(), the main asyncio event loop remains free to process other coroutines. This prevents the I/O operation from blocking the entire application, maintaining concurrency and responsiveness.
How This Fails in Real Systems
A real-time customer service agent system ran on FastAPI, serving 100 concurrent chat streams. A junior engineer added standard synchronous database logging inside the stream generator loop to log daily token usage metrics. The production server CPU usage immediately spiked to 100%, causing overall system latencies to jump from 150ms to over 12 seconds, dropping 90% of user connections.
Key Takeaway
async for streaming loop; delegate them to thread pools.async, any operations within it will automatically be non-blocking, failing to recognize that standard synchronous I/O calls will still block the single-threaded asyncio event loop.