Reinforcement Learning

State Representation and Transitions

State representation is the process of mapping raw environmental observations into a compact, feature-rich format that an agent can interpret.
Transitions represent the dynamics of the environment, defining how an agent's current state and action result in a subsequent state.
The Markov Property is the foundational assumption that the current state contains all necessary information to predict future outcomes.
Effective representation learning is critical for overcoming the "curse of dimensionality" in complex, high-dimensional environments.
The interplay between state abstraction and transition modeling determines the agent's ability to generalize across unseen scenarios.

Why It Matters

Robotics

In robotics, companies like Boston Dynamics use state representation to process LiDAR and depth camera data into a manageable format for bipedal locomotion. By mapping raw sensor noise into a latent representation of the robot's center of mass and joint angles, the controller can maintain balance on uneven terrain. This allows the robot to generalize its walking gait to environments it has never encountered during training.

Financial sector

In the financial sector, high-frequency trading firms utilize state representations to interpret order book dynamics. Raw data from market feeds is transformed into latent vectors that capture liquidity trends and volatility patterns. These representations allow RL agents to execute large orders with minimal market impact by predicting how the market state will transition in response to their own trading activity.

Healthcare

In healthcare, RL is applied to personalized treatment planning, such as managing insulin delivery for diabetic patients. The "state" is a combination of blood glucose levels, heart rate, and caloric intake, which are highly noisy and irregular. By learning a robust representation of these physiological states, the model can predict the transition effect of a specific insulin dose, optimizing the dosage to keep the patient within a healthy range while minimizing the risk of hypoglycemia.

How it Works

The Intuition of States and Transitions

At its heart, Reinforcement Learning (RL) is about an agent navigating a world. To make a decision, the agent must first "understand" where it is. This is the role of the state. Imagine you are playing a game of chess. The state is the exact configuration of every piece on the board. You do not need to know how you arrived at this position—which moves were made ten turns ago—to decide your next move. This is the essence of the Markov Property.

Transitions are the "what happens next" component. If you move your knight, the state of the board changes. The transition function is the set of rules that dictates that change. In a deterministic world, the transition is simple: move X leads to state Y. In a stochastic world, the transition is probabilistic: move X leads to state Y with 70% probability and state Z with 30% probability. Understanding these two concepts allows an agent to map its current reality to a future goal.

From Raw Data to State Representations

In modern RL, we rarely receive a perfect "state" like a chess board. Instead, we receive raw data, such as a stream of pixels from a camera or a high-frequency vibration sensor. Feeding raw pixels directly into a decision-making algorithm is computationally expensive and often leads to overfitting. This is where state representation becomes vital. We use neural networks—specifically Convolutional Neural Networks (CNNs) for images or Recurrent Neural Networks (RNNs) for time-series data—to compress these inputs into a "latent vector."

A good representation is one that captures the "task-relevant" features. For example, if an autonomous car is driving, the exact shade of the sky is irrelevant, but the distance to the car in front is critical. A robust representation encoder will learn to ignore the sky and emphasize the distance. By mapping raw observations to a structured latent space, we allow the agent to learn policies that are invariant to irrelevant environmental noise.

Modeling Transitions in Latent Space

When the environment is complex, learning the transition function directly in the raw input space is nearly impossible. Instead, we learn a "world model." We train a neural network to predict the next latent state $s_{t+1}$ given the current latent state $s_t$ and action $a_t$ . This is known as latent dynamics learning.

This approach is powerful because it allows for "imagination." An agent can perform "rollouts" inside its own head, simulating hundreds of potential future trajectories without ever taking a physical action. This is the cornerstone of model-based RL architectures like MuZero or Dreamer. The challenge, however, is the "compounding error" problem: if your transition model is slightly inaccurate, simulating too many steps into the future will lead the agent to believe in a fantasy world that does not exist, causing the policy to collapse.

Common Pitfalls

Assuming the state must be the entire history Learners often think they need to feed the entire sequence of past observations into the model. However, if the state is well-defined, the Markov Property allows us to discard the history, which prevents the model from becoming overly complex and slow.
Confusing observations with states Many beginners treat raw pixels as the "state." In reality, pixels are observations; the true state is the underlying physical configuration, and failing to distinguish between the two leads to poor generalization.
Ignoring the noise in transitions Students often assume transitions are deterministic. In reality, most real-world transitions are stochastic, and ignoring this uncertainty leads to fragile agents that fail when the environment behaves unexpectedly.
Over-relying on representation size Increasing the dimensionality of the latent space does not automatically improve performance. A latent space that is too large can lead to overfitting, where the agent memorizes noise rather than learning the underlying dynamics.

Sample Code

Python

import torch
import torch.nn as nn
import numpy as np

# Simple Latent Dynamics Model
class LatentDynamics(nn.Module):
    def __init__(self, state_dim, action_dim):
        super().__init__()
        # Predicts next state based on current state and action
        self.net = nn.Sequential(
            nn.Linear(state_dim + action_dim, 64),
            nn.ReLU(),
            nn.Linear(64, state_dim)
        )

    def forward(self, state, action):
        x = torch.cat([state, action], dim=-1)
        return self.net(x)

# Mock data: 10 samples, 4-dim state, 2-dim action
state = torch.randn(10, 4)
action = torch.randn(10, 2)
model = LatentDynamics(4, 2)

# Predict next state
next_state_pred = model(state, action)
print(f"Predicted next state shape: {next_state_pred.shape}")
# Output: Predicted next state shape: torch.Size([10, 4])

Key Terms

State (S)

The complete description of the environment at a specific point in time, encompassing all variables necessary for decision-making. It serves as the input for the agent's policy and the basis for calculating the value of a situation.

Observation (O)

The partial or noisy information an agent receives from the environment, which may not fully represent the underlying state. In many real-world scenarios, agents deal with observations rather than perfect state information, necessitating belief states or history-based representations.

Transition Function (T)

A mathematical mapping that defines the probability of moving to a new state given the current state and a specific action taken by the agent. It essentially encodes the "rules of physics" or the logic governing how the environment changes over time.

Markov Property

The principle stating that the future state depends only on the current state and action, independent of the history of preceding states. This property simplifies the computation of optimal policies by allowing agents to ignore the sequence of events that led to the current situation.

State Representation Learning (SRL)

The process of transforming high-dimensional, raw input data (such as pixels or sensor streams) into a lower-dimensional, structured latent space. This transformation aims to preserve essential information while discarding irrelevant noise, facilitating faster and more stable learning.

Latent Space

A compressed, abstract representation of the state where similar states are grouped closer together based on their functional significance. By operating in this space, agents can perform computations more efficiently and generalize better to states they have not encountered before.

Model-Based RL

A paradigm where the agent explicitly learns or is provided with a model of the environment's transition dynamics. This allows the agent to simulate future trajectories and plan its actions without needing to interact with the real environment for every decision.