Statistics & Probability

Discrete Probability Distributions

Discrete probability distributions model the likelihood of outcomes for variables that take on countable, distinct values.
They form the backbone of classification tasks in machine learning, where models predict categorical labels rather than continuous values.
Probability Mass Functions (PMFs) define the probability of each specific outcome, ensuring all probabilities sum to exactly one.
Understanding these distributions is essential for implementing algorithms like Naive Bayes, Logistic Regression, and various generative models.
Key distributions, such as Bernoulli, Binomial, and Poisson, provide the mathematical framework for modeling real-world uncertainty in data.

Why It Matters

Telecommunications industry

In the telecommunications industry, companies like Verizon or AT&T use the Poisson distribution to model call arrival rates at their switching centers. By predicting the number of calls expected in a specific time window, they can dynamically allocate bandwidth and server resources to prevent network congestion. This ensures that infrastructure is neither over-provisioned (wasting money) nor under-provisioned (causing dropped calls).

E-commerce

In e-commerce, platforms like Amazon utilize discrete distributions to model customer purchase behavior. Specifically, they use Bernoulli trials to predict the probability that a user will click "Buy" on a specific product page based on historical session data. By aggregating these individual probabilities across millions of users, the platform can forecast inventory demand and optimize supply chain logistics for high-velocity items.

Healthcare sector

In the healthcare sector, hospitals use discrete probability models to manage patient flow in emergency departments. By modeling the number of patient arrivals per hour as a discrete process, administrators can staff doctors and nurses effectively during peak times. This application of probability theory directly impacts patient outcomes by reducing wait times and ensuring that critical care is available when demand spikes.

How it Works

Intuition: The World of Counts

In machine learning, we often deal with data that is inherently categorical. Imagine you are building a spam filter. An email is either "spam" or "not spam." There is no "half-spam" email. When we model these scenarios, we use discrete probability distributions. Unlike continuous distributions, which deal with ranges (like the height of a person), discrete distributions deal with distinct, countable "buckets." If you toss a coin, roll a die, or count the number of customers entering a store in an hour, you are operating in the realm of discrete probability. The core intuition is that we assign a specific "weight" or "mass" to each possible outcome, and these weights must collectively account for the entire universe of possibilities.

The Probability Mass Function (PMF)

The PMF is the fingerprint of a discrete distribution. If you have a variable $X$ representing the result of a six-sided die roll, the PMF tells you that $P(X=1) = 1/6$ , $P(X=2) = 1/6$ , and so on. The crucial constraint is that the sum of the PMF across all possible values must equal 1. If your model predicts probabilities that sum to 0.8 or 1.2, you have violated the fundamental axioms of probability. In machine learning, when we use a softmax function in a neural network, we are essentially generating a PMF over a set of discrete classes.

Common Discrete Distributions

Several "standard" distributions appear constantly in data science. The Bernoulli distribution models a single binary event (0 or 1). The Binomial distribution extends this by counting the number of successes in $n$ independent Bernoulli trials (e.g., "What is the probability of getting exactly 3 heads in 10 flips?"). The Poisson distribution is used for counting events occurring within a fixed interval of time or space, provided these events happen with a known constant mean rate and independently of the time since the last event (e.g., "How many emails will I receive in the next hour?"). Finally, the Categorical distribution generalizes the Bernoulli distribution to more than two outcomes, which is the foundation of multi-class classification.

Edge Cases and Constraints

When working with discrete distributions, one must be wary of the "support." Some distributions have finite support (like the Binomial distribution, which cannot exceed the number of trials $n$ ), while others have infinite support (like the Poisson distribution, which can theoretically yield any non-negative integer). In high-dimensional machine learning, we often encounter "sparsity," where many outcomes have a probability of zero. Handling these zeros correctly is vital; for instance, in Bayesian inference, we often use "Laplace smoothing" to ensure that we don't assign a probability of zero to an event simply because it didn't appear in our training set. Ignoring these edge cases can lead to models that are brittle and prone to catastrophic failure when encountering unseen data.

Common Pitfalls

Confusing PMF with PDF Learners often try to integrate a PMF like a Probability Density Function (PDF). Remember that PMFs are for discrete variables and require summation, while PDFs are for continuous variables and require integration.
Assuming all outcomes are equally likely Many beginners default to the Uniform distribution assumption. Always verify if your data has a bias or a specific structure, such as a skewed distribution, before assuming equal probabilities.
Ignoring the sum-to-one rule A common coding error is failing to normalize probabilities. Always ensure your output layer (e.g., softmax) correctly forces the sum of all discrete probabilities to equal exactly 1.0.
Misinterpreting the "Expected Value" Students often think the expected value must be a possible outcome of the experiment. In reality, the expected value is an average and often results in a non-integer value, even if the underlying variable only takes integer values (e.g., the average of a die roll is 3.5).

Sample Code

Python

import numpy as np
from scipy.stats import binom

# Define parameters for a Binomial distribution: 
# n = 10 trials, p = 0.5 probability of success
n, p = 10, 0.5
dist = binom(n, p)

# Calculate the probability of getting exactly 5 successes
prob_5 = dist.pmf(5)
print(f"Probability of exactly 5 successes: {prob_5:.4f}")

# Calculate the expected value (mean) and variance
mean = dist.mean()
variance = dist.var()
print(f"Mean: {mean}, Variance: {variance}")

# Generate 1000 random samples from this distribution
samples = dist.rvs(1000)
print(f"First 10 samples: {samples[:10]}")

# Output:
# Probability of exactly 5 successes: 0.2461
# Mean: 5.0, Variance: 2.5
# First 10 samples: [6 4 5 5 4 6 5 6 4 4]

Key Terms

Random Variable

A mathematical variable whose value is subject to variations due to chance or a random process. In discrete settings, these variables map outcomes of a random experiment to a countable set of numbers.

Probability Mass Function (PMF)

A function that gives the probability that a discrete random variable is exactly equal to some value. It serves as the primary way to define a discrete distribution, where the sum of all probabilities must equal one.

Cumulative Distribution Function (CDF)

A function that describes the probability that a random variable

X

will take a value less than or equal to

x

. It is calculated by summing the probabilities of all outcomes up to and including the specified value.

Expected Value

The long-run average value of a random variable over many independent repetitions of an experiment. It is calculated as the weighted average of all possible values, where the weights are the probabilities of those values occurring.

Variance

A measure of the dispersion or "spread" of a set of values around their mean. In probability, it quantifies how much the outcomes of a random variable deviate from the expected value.

Support

The set of all possible values that a random variable can take with non-zero probability. For discrete distributions, this set is always countable, meaning it can be finite or countably infinite.

Bernoulli Trial

A random experiment with exactly two possible outcomes, often labeled as "success" and "failure." It is the fundamental building block for more complex discrete distributions like the Binomial distribution.