LangChain vs LiteLLM vs Direct SDK — Trade-offs
Imagine a shared whiteboard that all "agents" use to remember things. If you write something on that whiteboard, every other agent also sees it and can modify it. In a multi-user system, this means one user's conversation history might inadvertently get mixed with another's, leading to confusion and data leakage.
The Setup
You are building a conversational backend using a lightweight routing model pattern. You try to manage session state globally or instantiate a shared agent configuration class to reduce connection overhead.
What Does This Print?
import asyncio
class SharedAgentConfig:
# Under the hood, many orchestrators reuse a global memory list
memory: list[str] = []
def add_message(self, text: str):
self.memory.append(text)
async def handle_user_request(user_id: int, user_message: str):
# Reusing a dynamic configuration class
agent = SharedAgentConfig()
agent.add_message(user_message)
await asyncio.sleep(0.01) # Simulating API trip time
return f"User {user_id} history: {agent.memory}"
async def test_concurrency():
# Simulate user 1 and user 2 sending messages near-simultaneously
t1 = handle_user_request(101, "Hello, I am user 101.")
t2 = handle_user_request(102, "Hey, secret key is 'x99'.")
print(await asyncio.gather(t1, t2))
asyncio.run(test_concurrency())
The Output
Both user instances print history containing information from the other user. Because memory is defined directly in the class namespace, it acts as a global mutable variable shared by every instantiation of SharedAgentConfig. High-level frameworks that wrap execution engines in abstract state trackers often utilize global registries, hidden module variables, or implicit singletons, creating high risk for state cross-contamination in async web environments.
Why Python Does This
When Python processes a class definition, it executes the body code block immediately. Any variable defined within this block (such as memory = []) is bound to the class's dictionary (__dict__). When you call self.memory.append(), Python resolves the reference to the class-level list object because there is no instance-level memory shadow variable. Unlike lightweight wrappers (e.g., LiteLLM, which simply wraps standard API calls in pure functions), high-level frameworks often carry complex state machines and implicit context pooling within their class structures. In concurrent async environments like FastAPI, these global state mechanisms overlap, leading to context contamination across separate event loop task frames.
The Fix
class IsolatedAgentConfig:
def __init__(self):
# Initialize state inside __init__ to guarantee instance-level memory isolation
self.memory: list[str] = []
def add_message(self, text: str):
self.memory.append(text)
async def handle_user_request_safe(user_id: int, user_message: str):
# Now, each request has a fully isolated configuration instance
agent = IsolatedAgentConfig()
agent.add_message(user_message)
await asyncio.sleep(0.01)
return f"User {user_id} history: {agent.memory}"
By making memory an instance attribute (self.memory = [] in __init__) instead of a class attribute, each IsolatedAgentConfig object gets its own distinct list. Python's attribute resolution finds self.memory in the instance __dict__ first, so mutations via self.memory.append() never touch the shared class-level object. This ensures that each user's conversation history is completely isolated and independent, preventing state leakage between concurrent requests.
How This Fails in Real Systems
A financial advisory API utilized an enterprise LLM framework with hidden thread-local memory caches. Under high load, the framework started assigning identical memory caches to different async request workers. As a result, sensitive account metrics and portfolio valuations of User A were served directly to User B.