Random variables and distributions

Probability from zero

Learning objectives

  • Define a random variable as a measurable function X : Ω → ℝ
  • Distinguish discrete (PMF) from continuous (PDF) random variables
  • Read and use the cumulative distribution function F(x) = P(X ≤ x)
  • Recognise that the DISTRIBUTION of X is induced by the probability measure on Ω
  • Apply the empirical-distribution principle: i.i.d. samples reveal the underlying distribution

§0.1 set up the probability axioms on a sample space Ω\Omega. Now we collapse Ω\Omega into something we can compute with: a RANDOM VARIABLE XX, a function that assigns a real number to every outcome ωΩ\omega \in \Omega. The PROBABILITY MEASURE on Ω\Omega then induces a distribution for XX — the joint object that governs how its values behave under repeated experiments.

Formal definition

A random variable is a measurable function X:ΩRX : \Omega \to \mathbb{R}. "Measurable" means: for every Borel set BRB \subset \mathbb{R}, the pre-image X1(B)={ωΩ:X(ω)B}X^{-1}(B) = {\omega \in \Omega : X(\omega) \in B} is in the σ-algebra F\mathcal{F} on Ω\Omega — so we can compute P(XB)P(X \in B) using the probability measure already defined.

In practice you don't verify measurability by hand. Every variable you'll encounter (sums, products, counts, durations, indicators) is automatically a random variable on the standard σ-algebras.

Discrete vs continuous

  • Discrete: XX takes values in a countable set. Characterised by its PROBABILITY MASS FUNCTION (PMF) pX(x)=P(X=x)p_X(x) = P(X = x) with xpX(x)=1\sum_x p_X(x) = 1.
  • Continuous: XX takes values in an uncountable set (typically an interval). Characterised by its PROBABILITY DENSITY FUNCTION (PDF) fXf_X such that P(aXb)=abfX(x)dxP(a \le X \le b) = \int_a^b f_X(x),dx and fX(x)dx=1\int_{-\infty}^{\infty} f_X(x),dx = 1.

The PDF is NOT a probability — it is a density. fX(x)dxf_X(x),dx is the probability that XX falls in an infinitesimal interval around xx. fX(x)f_X(x) can exceed 1 (think of a uniform distribution on [0,0.5][0, 0.5] where fX=2f_X = 2).

The cumulative distribution function

For ANY random variable, discrete or continuous, the CDF is:

FX(x)=P(Xx).F_X(x) = P(X \le x).

It is non-decreasing, right-continuous, with F()=0F(-\infty) = 0 and F(+)=1F(+\infty) = 1. For discrete X, F is a step function; for continuous X, F is the integral of f. The CDF is the UNIVERSAL description — every random variable has one — and many results (quantile, transformation) start from F rather than f/p.

What the empirical distribution reveals

Given nn i.i.d. samples X1,,XnX_1, \ldots, X_n from XX, the EMPIRICAL distribution F^n(x)=(1/n)i1{Xix}\hat{F}_n(x) = (1/n) \sum_i \mathbb{1}{X_i \le x} converges to FXF_X as nn \to \infty (Glivenko-Cantelli; §0.6 makes this rigorous). Practically: simulate, plot a histogram, and watch the underlying distribution emerge. This is the foundation of Monte Carlo (§0.10).

Random Variables ExplorerInteractive figure — enable JavaScript to interact.

Try it

  • Switch between "Coin flip" and "Sum of two dice". Both are DISCRETE — both have PMFs (green stems). Coin: uniform on {0, 1}. Dice sum: triangular peak at 7 (sum = 7 has 6 ways to occur out of 36). The CDF jumps at each integer for the dice sum.
  • Switch to "Uniform on [0, 1]". CONTINUOUS — PDF is a flat line at 1.0 (the density is constant in the support). CDF is a 45° line. Notice the PDF value EQUALS 1.0, not a probability — for an interval [a, b] the probability is b - a.
  • Switch to "Adult height". CONTINUOUS — bell-shaped PDF centred at 1.70 m. CDF is the smooth Normal S-curve from 0 to 1. Use the cursor to mentally read off P(X1.80)0.84P(X \le 1.80) \approx 0.84 (i.e., about 84% of adults are shorter than 1.80 m under this model).
  • Set n samples = 0, then crank up to 2000. Watch the blue empirical histogram emerge and converge to the green theoretical curve. This is the EMPIRICAL DISTRIBUTION FUNCTION at work.
  • Compare the sample mean to the theoretical mean: coin = 0.5, dice = 7, uniform = 0.5, height = 1.70. The §0.6 LLN says X̄ → μ as n → ∞; even at n = 200 you usually see it within ±2% of the truth.

For the uniform-on-[0,1] case, the PDF is f(x)=1f(x) = 1 for all x[0,1]x \in [0, 1]. The PDF value of 1 is NOT a probability — what would the probability of X=0.5X = 0.5 exactly be (for any specific point), and why is this OK mathematically?

What you now know

A random variable is a function on the sample space. Its distribution — described by PMF / PDF / CDF — is induced by the underlying probability measure. Discrete distributions have point masses; continuous distributions spread mass over intervals. The empirical distribution from i.i.d. samples converges to the true distribution; this is the foundation of every statistical method. §0.3 generalises to MULTIPLE random variables and the joint, conditional, marginal structure.

References

  • Wasserman, L. (2004). All of Statistics. Springer. (Chapter 2 — random variables and distributions.)
  • Casella, G., Berger, R.L. (2002). Statistical Inference, 2nd ed. Duxbury. (Sections 1.4-1.6.)
  • Ross, S.M. (2014). Introduction to Probability Models, 11th ed. Academic Press. (Chapter 2.)
  • Billingsley, P. (1995). Probability and Measure, 3rd ed. Wiley. (For the measure-theoretic foundation; chapters 1-3.)
  • Feller, W. (1971). An Introduction to Probability Theory and Its Applications, Vol. 2. Wiley. (Continuous distributions, change of variable.)

This page is prerendered for SEO and accessibility. The interactive widgets above hydrate on JavaScript load.