LangChain vs LiteLLM vs Direct SDK — Trade-offs

Published May 26, 2026 · By MortalApps · ·

Mental Model

Imagine a shared whiteboard that all "agents" use to remember things. If you write something on that whiteboard, every other agent also sees it and can modify it. In a multi-user system, this means one user's conversation history might inadvertently get mixed with another's, leading to confusion and data leakage.

Rule: When choosing LLM frameworks, prefer stateless, pure-functional libraries like LiteLLM over stateful abstractions to ensure deterministic context boundaries.

The Setup

You are building a conversational backend using a lightweight routing model pattern. You try to manage session state globally or instantiate a shared agent configuration class to reduce connection overhead.

What Does This Print?

⚠ Broken code

Python

import asyncio

class SharedAgentConfig:
    # Under the hood, many orchestrators reuse a global memory list
    memory: list[str] = [] 
    
    def add_message(self, text: str):
        self.memory.append(text)

async def handle_user_request(user_id: int, user_message: str):
    # Reusing a dynamic configuration class 
    agent = SharedAgentConfig()
    agent.add_message(user_message)
    await asyncio.sleep(0.01) # Simulating API trip time
    return f"User {user_id} history: {agent.memory}"

async def test_concurrency():
    # Simulate user 1 and user 2 sending messages near-simultaneously
    t1 = handle_user_request(101, "Hello, I am user 101.")
    t2 = handle_user_request(102, "Hey, secret key is 'x99'.")
    print(await asyncio.gather(t1, t2))

asyncio.run(test_concurrency())

Predict whether the histories of user 101 and user 102 are kept isolated from one another.

The Output

What actually happens

[ "User 101 history: ['Hello, I am user 101.', \"Hey, secret key is 'x99'.\"]", "User 102 history: ['Hello, I am user 101.', \"Hey, secret key is 'x99'.\"]" ]

Both user instances print history containing information from the other user. Because memory is defined directly in the class namespace, it acts as a global mutable variable shared by every instantiation of SharedAgentConfig. High-level frameworks that wrap execution engines in abstract state trackers often utilize global registries, hidden module variables, or implicit singletons, creating high risk for state cross-contamination in async web environments.

Why Python Does This

When Python processes a class definition, it executes the body code block immediately. Any variable defined within this block (such as memory = []) is bound to the class's dictionary (__dict__). When you call self.memory.append(), Python resolves the reference to the class-level list object because there is no instance-level memory shadow variable. Unlike lightweight wrappers (e.g., LiteLLM, which simply wraps standard API calls in pure functions), high-level frameworks often carry complex state machines and implicit context pooling within their class structures. In concurrent async environments like FastAPI, these global state mechanisms overlap, leading to context contamination across separate event loop task frames.

The Fix

✓ Corrected pattern

Python

class IsolatedAgentConfig:
    def __init__(self):
        # Initialize state inside __init__ to guarantee instance-level memory isolation
        self.memory: list[str] = []

    def add_message(self, text: str):
        self.memory.append(text)

async def handle_user_request_safe(user_id: int, user_message: str):
    # Now, each request has a fully isolated configuration instance
    agent = IsolatedAgentConfig()
    agent.add_message(user_message)
    await asyncio.sleep(0.01)
    return f"User {user_id} history: {agent.memory}"

By making memory an instance attribute (self.memory = [] in __init__) instead of a class attribute, each IsolatedAgentConfig object gets its own distinct list. Python's attribute resolution finds self.memory in the instance __dict__ first, so mutations via self.memory.append() never touch the shared class-level object. This ensures that each user's conversation history is completely isolated and independent, preventing state leakage between concurrent requests.

How This Fails in Real Systems

A financial advisory API utilized an enterprise LLM framework with hidden thread-local memory caches. Under high load, the framework started assigning identical memory caches to different async request workers. As a result, sensitive account metrics and portfolio valuations of User A were served directly to User B.

Key Takeaway

When choosing LLM frameworks, prefer stateless, pure-functional libraries like LiteLLM over stateful abstractions to ensure deterministic context boundaries.

Common mistake: Developers unknowingly create shared mutable state in agent configurations, leading to cross-talk between concurrent user sessions when instances of the configuration class are reused or dynamically instantiated.