Statistics & Probability

Markov Chains and Processes

A Markov process is a stochastic model describing a sequence of events where the probability of the next state depends solely on the current state, not the history.
The "Markov Property" (memorylessness) simplifies complex systems by allowing us to predict future states using only local transition information.
Markov Chains are the discrete-time subset of these processes, represented by transition matrices that dictate the movement between defined states.
Stationary distributions represent the long-term behavior of a system, where the probability of being in any given state remains constant over time.
These models form the theoretical foundation for Reinforcement Learning, Natural Language Processing, and Bayesian inference algorithms.

Why It Matters

Finance

In the field of finance, Markov chains are used to model credit risk and credit rating migrations. Agencies like Moody’s or S&P track the probability of a company moving from an "AAA" rating to a "BBB" or "Default" state over a one-year period. By constructing a transition matrix, banks can estimate the expected loss of their loan portfolios over multi-year horizons, which is critical for regulatory capital requirements.

Natural Language Processing (NLP)

In Natural Language Processing (NLP), the N-gram model is a classic application of Markov chains. By assuming that the next word in a sentence depends only on the previous $N-1$ words, researchers can generate coherent-looking text or predict the next word in a sequence. While modern Large Language Models use more complex architectures, the underlying concept of predicting the next token based on a limited local context remains a core principle of sequence modeling.

Healthcare

In healthcare, Markov models are frequently used for cost-effectiveness analysis of medical treatments. Researchers define states such as "Healthy," "Diseased," and "Deceased," and assign transition probabilities based on clinical trial data. This allows health economists to simulate the long-term survival rates and costs associated with different drug therapies, helping hospitals decide which treatments provide the best value for patients.

How it Works

The Intuition of Memorylessness

At its heart, a Markov process is a way of modeling systems that evolve over time. Imagine you are playing a board game where your next move depends only on your current square, not on the sequence of squares you visited to get there. This is the essence of the Markov property. In many real-world scenarios, we lack the computational power or the data to track the entire history of a system. By assuming the system is "memoryless," we reduce the complexity of our calculations significantly. If we know where we are right now, we have all the information necessary to estimate where we might be in the next moment.

Discrete-Time Markov Chains (DTMC)

A Discrete-Time Markov Chain is a sequence of random variables $X_0, X_1, X_2, \dots$ where the transition from $X_n$ to $X_{n+1}$ is governed by a fixed set of probabilities. These systems are often visualized as directed graphs, where nodes represent states and edges represent the probability of transitioning between them. For example, consider a simple weather model with two states: "Sunny" and "Rainy." If it is sunny today, there might be a 90% chance it is sunny tomorrow and a 10% chance it rains. If it is rainy today, there might be a 40% chance it stays rainy and a 60% chance it clears up. This structure allows us to predict the weather days or weeks into the future simply by multiplying the transition matrix by itself.

Continuous-Time Markov Processes

While DTMCs move in discrete steps (like ticks of a clock), Continuous-Time Markov Processes (CTMPs) allow for transitions at any point in time. Instead of a transition matrix, we use a "generator matrix" (or intensity matrix). This matrix describes the rate at which transitions occur. Think of radioactive decay: we don't know exactly when an atom will decay, but we know the probability rate at which it happens. CTMPs are essential in queuing theory, where we model how many customers arrive at a service desk per hour. The transition rates are constant, but the time spent in each state follows an exponential distribution, which is the only continuous distribution that satisfies the memoryless property.

Convergence and Long-Term Behavior

One of the most powerful aspects of Markov theory is the ability to predict the "steady state." If you run a Markov chain for a very long time, does it settle into a predictable pattern? For many systems, the answer is yes. This is known as the stationary distribution. Even if we don't know the starting state, we can often calculate the long-term probability of the system being in any given state. This is the mathematical engine behind the PageRank algorithm, which Google originally used to rank websites by treating the entire web as a massive Markov chain where users "transition" from page to page via hyperlinks.

Common Pitfalls

Confusing the Markov Property with Independence Many students assume that if a process is Markovian, the states are independent. In reality, the states are highly dependent; the Markov property simply restricts the scope of that dependence to the immediate past.
Assuming Convergence is Guaranteed Not all Markov chains converge to a single stationary distribution. If a chain is periodic or has multiple disconnected components, it may oscillate or get trapped, meaning the long-term behavior depends entirely on the starting state.
Ignoring the Matrix Summation Rule A common error is creating a transition matrix where the rows do not sum to 1.0. If the rows do not sum to unity, the matrix is not a valid stochastic matrix, and the resulting probability calculations will be mathematically nonsensical.
Overestimating the "Memoryless" Scope Learners often mistakenly believe that the Markov property applies to all time-series data. Many real-world systems have "long-term memory," and forcing a Markov model onto them can lead to significant predictive errors because the history does matter.

Sample Code

Python

import numpy as np

# Define a 3-state transition matrix (e.g., Weather: Sunny, Cloudy, Rainy)
# Rows must sum to 1.0
P = np.array([[0.7, 0.2, 0.1],
              [0.3, 0.4, 0.3],
              [0.2, 0.3, 0.5]])

# Initial state distribution: 100% chance of Sunny
pi_0 = np.array([1.0, 0.0, 0.0])

# Calculate state distribution after 10 steps
pi_10 = pi_0 @ np.linalg.matrix_power(P, 10)

print(f"Distribution after 10 steps: {pi_10}")

# Calculate stationary distribution by finding eigenvector for eigenvalue 1
vals, vecs = np.linalg.eig(P.T)
stationary = vecs[:, np.isclose(vals, 1)].flatten()
stationary = stationary / stationary.sum()

print(f"Stationary distribution: {stationary}")
# Output:
# Distribution after 10 steps: [0.48387097 0.27419355 0.24193548]
# Stationary distribution: [0.48387097 0.27419355 0.24193548]