AI Agents

ReAct Framework Reasoning Patterns

ReAct (Reason + Act) is a prompting framework that enables Large Language Models (LLMs) to interleave verbal reasoning traces with dynamic tool execution.
By forcing the model to articulate its thought process before taking an action, ReAct significantly reduces hallucinations and improves task grounding.
The framework relies on a loop of Thought, Action, and Observation, allowing the agent to dynamically adjust its strategy based on real-world feedback.
ReAct bridges the gap between static language generation and autonomous problem-solving in external environments like APIs, databases, or search engines.

Why It Matters

Financial Analysis

Investment firms use ReAct agents to aggregate data from multiple financial news APIs and internal databases. The agent reasons through the volatility of a stock by first querying historical price data, then identifying relevant news events, and finally synthesizing a risk report. This reduces the time analysts spend manually cross-referencing disparate data sources.

Technical Support Automation

Large software companies deploy ReAct agents to handle complex customer tickets. When a user reports a bug, the agent queries the internal Jira database for similar issues, checks the system logs via an API, and suggests a resolution based on the findings. If the first suggested fix fails, the agent observes the user's feedback and attempts a different troubleshooting step.

Supply Chain Management

Logistics providers utilize ReAct agents to optimize delivery routes in real-time. The agent monitors weather APIs, traffic data, and warehouse inventory levels, reasoning about the best path for a fleet of vehicles. By interleaving these observations with its planning, the agent can dynamically reroute trucks when a sudden road closure is reported.

How it Works

The Intuition: Thinking Before Acting

In human cognition, we rarely perform complex tasks in a single, unthinking burst. If you are asked to find the current stock price of a company and calculate its growth over the last year, you would first formulate a plan: "I need to search for the price, then find the price from last year, then perform the subtraction." The ReAct framework brings this human-like "deliberative" loop to AI agents.

Standard LLMs are essentially "next-token predictors." When you ask them a question, they try to predict the answer immediately. If the answer requires external data, the model might guess based on its training data, leading to errors. ReAct forces the model to pause. It mandates a structure where the model must output a Thought, then an Action, and then process an Observation. This loop continues until the model determines it has enough information to provide a final answer.

The Theory: Interleaving Reasoning and Action

The ReAct framework, introduced by Yao et al. (2022), posits that reasoning and acting are synergistic. By interleaving these two, the agent creates a "reasoning trace" that serves as a working memory.

Consider a scenario where an agent must answer: "Who is the CEO of the company that acquired the startup founded by John Doe?" 1. Thought: I need to find out who John Doe is and what startup he founded. 2. Action: Search(John Doe startup) 3. Observation: John Doe founded "TechNova." 4. Thought: Now I need to find who acquired TechNova. 5. Action: Search(TechNova acquisition) 6. Observation: TechNova was acquired by "GlobalCorp." 7. Thought: Now I need to find the CEO of GlobalCorp. 8. Action: Search(CEO of GlobalCorp) 9. Observation: The CEO is Jane Smith. 10. Final Answer: The CEO is Jane Smith.

This structure prevents the model from getting lost. If an action fails (e.g., the search returns no results), the model sees the error in the Observation and can adjust its Thought process accordingly.

Edge Cases and Robustness

While ReAct is powerful, it is not infallible. One major edge case is "infinite loops," where the agent gets stuck in a cycle of incorrect actions. For example, if an API call returns a generic error, the agent might repeatedly try the same action, hoping for a different result. To mitigate this, practitioners implement "max-steps" constraints or "reflection" layers where the agent is asked to critique its own reasoning if it fails to make progress after three attempts.

Another challenge is "prompt drift." As the reasoning trace grows, the context window fills up. If the trace becomes too long, the model may lose track of the original goal. Advanced implementations use "summarization" of the reasoning trace, where the agent periodically condenses its past thoughts and observations to keep the context window manageable.

Common Pitfalls

ReAct is just Chain-of-Thought While both involve reasoning, CoT is static and purely internal. ReAct is dynamic and external, requiring the agent to interact with tools to validate its thoughts.
The agent "learns" from ReAct ReAct does not update the model's weights or long-term memory. It is a prompting strategy that improves performance within a single session, not a training method that improves the model's intelligence over time.
More steps are always better Adding more reasoning steps can increase latency and cost without improving accuracy. Agents can get trapped in "reasoning loops" where they over-analyze simple tasks, so step-limiting is essential.
ReAct solves all hallucinations ReAct mitigates grounding issues, but it cannot fix a model that lacks the fundamental capability to parse the tool's output correctly. If the tool provides a complex JSON response, the model must still be capable of interpreting that specific format.

Sample Code

Python

# Knowledge graph: each entity maps to its parent/manager
KNOWLEDGE = {"John Doe": "TechNova", "TechNova": "GlobalCorp", "GlobalCorp": "Jane Smith"}

def search(query: str) -> str:
    return KNOWLEDGE.get(query, "No information found.")

def react_agent(initial_query: str, max_steps: int = 4) -> str:
    query = initial_query
    for step in range(1, max_steps + 1):
        observation = search(query)
        print(f"Step {step}  Thought: look up '{query}'")
        print(f"          Action:  Search('{query}')")
        print(f"          Obs:     {observation}")
        if observation == "No information found.":
            return f"Cannot resolve '{query}'"
        # Multi-hop: follow the chain — use observation as the next query
        query = observation
        if observation == "Jane Smith":
            return f"Final Answer: {observation}"
    return f"Final Answer: {query}"

result = react_agent("John Doe")
print(result)
# Step 1  Thought: look up 'John Doe'    -> TechNova
# Step 2  Thought: look up 'TechNova'   -> GlobalCorp
# Step 3  Thought: look up 'GlobalCorp' -> Jane Smith
# Final Answer: Jane Smith

Key Terms

Agent

An autonomous system powered by an LLM that can perceive its environment, reason about goals, and execute actions to achieve them. Agents differ from standard chatbots by their ability to interact with external tools rather than just generating static text.

Chain-of-Thought (CoT)

A prompting technique where the model is encouraged to generate intermediate reasoning steps before arriving at a final answer. While CoT improves logic, it remains "closed" because it does not interact with external data sources during the reasoning process.

Grounding

The process of linking an LLM’s internal knowledge to verifiable external facts or real-time data. Without grounding, models are prone to "hallucinations," where they generate plausible-sounding but factually incorrect information.

Reasoning Trace

A structured sequence of thoughts generated by an agent that explains the "why" behind a specific action. These traces provide transparency, allowing developers to debug the agent's decision-making process.

Tool-Use (Function Calling)

The capability of an LLM to output structured data (often JSON) that triggers a specific software function or API call. This allows the model to perform tasks like calculating math, querying a database, or searching the web.

Observation

The output returned by an external tool or environment after an agent executes an action. This feedback is fed back into the LLM's context window, allowing the model to update its internal state and plan the next step.