Random variables and distributions

Probability from zero

Learning objectives

Define a random variable as a measurable function X : Ω → ℝ
Distinguish discrete (PMF) from continuous (PDF) random variables
Read and use the cumulative distribution function F(x) = P(X ≤ x)
Recognise that the DISTRIBUTION of X is induced by the probability measure on Ω
Apply the empirical-distribution principle: i.i.d. samples reveal the underlying distribution

§0.1 set up the probability axioms on a sample space $\Omega$ . Now we collapse $\Omega$ into something we can compute with: a RANDOM VARIABLE $X$ , a function that assigns a real number to every outcome $\omega \in \Omega$ . The PROBABILITY MEASURE on $\Omega$ then induces a distribution for $X$ — the joint object that governs how its values behave under repeated experiments.

Formal definition

A random variable is a measurable function $X : \Omega \to \mathbb{R}$ . "Measurable" means: for every Borel set $B \subset \mathbb{R}$ , the pre-image $X^{-1}(B) = {\omega \in \Omega : X(\omega) \in B}$ is in the σ-algebra $\mathcal{F}$ on $\Omega$ — so we can compute $P(X \in B)$ using the probability measure already defined.

In practice you don't verify measurability by hand. Every variable you'll encounter (sums, products, counts, durations, indicators) is automatically a random variable on the standard σ-algebras.

Discrete vs continuous

Discrete: $X$ takes values in a countable set. Characterised by its PROBABILITY MASS FUNCTION (PMF) $p_X(x) = P(X = x)$ with $\sum_x p_X(x) = 1$ .
Continuous: $X$ takes values in an uncountable set (typically an interval). Characterised by its PROBABILITY DENSITY FUNCTION (PDF) $f_X$ such that $P(a \le X \le b) = \int_a^b f_X(x),dx$ and $\int_{-\infty}^{\infty} f_X(x),dx = 1$ .

The PDF is NOT a probability — it is a density. $f_X(x),dx$ is the probability that $X$ falls in an infinitesimal interval around $x$ . $f_X(x)$ can exceed 1 (think of a uniform distribution on $[0, 0.5]$ where $f_X = 2$ ).

The cumulative distribution function

For ANY random variable, discrete or continuous, the CDF is:

F_X(x) = P(X \le x).

It is non-decreasing, right-continuous, with $F(-\infty) = 0$ and $F(+\infty) = 1$ . For discrete X, F is a step function; for continuous X, F is the integral of f. The CDF is the UNIVERSAL description — every random variable has one — and many results (quantile, transformation) start from F rather than f/p.

What the empirical distribution reveals

Given $n$ i.i.d. samples $X_1, \ldots, X_n$ from $X$ , the EMPIRICAL distribution $\hat{F}_n(x) = (1/n) \sum_i \mathbb{1}{X_i \le x}$ converges to $F_X$ as $n \to \infty$ (Glivenko-Cantelli; §0.6 makes this rigorous). Practically: simulate, plot a histogram, and watch the underlying distribution emerge. This is the foundation of Monte Carlo (§0.10).

Try it

Switch between "Coin flip" and "Sum of two dice". Both are DISCRETE — both have PMFs (green stems). Coin: uniform on {0, 1}. Dice sum: triangular peak at 7 (sum = 7 has 6 ways to occur out of 36). The CDF jumps at each integer for the dice sum.
Switch to "Uniform on [0, 1]". CONTINUOUS — PDF is a flat line at 1.0 (the density is constant in the support). CDF is a 45° line. Notice the PDF value EQUALS 1.0, not a probability — for an interval [a, b] the probability is b - a.
Switch to "Adult height". CONTINUOUS — bell-shaped PDF centred at 1.70 m. CDF is the smooth Normal S-curve from 0 to 1. Use the cursor to mentally read off $P(X \le 1.80) \approx 0.84$ (i.e., about 84% of adults are shorter than 1.80 m under this model).
Set n samples = 0, then crank up to 2000. Watch the blue empirical histogram emerge and converge to the green theoretical curve. This is the EMPIRICAL DISTRIBUTION FUNCTION at work.
Compare the sample mean to the theoretical mean: coin = 0.5, dice = 7, uniform = 0.5, height = 1.70. The §0.6 LLN says X̄ → μ as n → ∞; even at n = 200 you usually see it within ±2% of the truth.

For the uniform-on-[0,1] case, the PDF is $f(x) = 1$ for all $x \in [0, 1]$ . The PDF value of 1 is NOT a probability — what would the probability of $X = 0.5$ exactly be (for any specific point), and why is this OK mathematically?

What you now know

A random variable is a function on the sample space. Its distribution — described by PMF / PDF / CDF — is induced by the underlying probability measure. Discrete distributions have point masses; continuous distributions spread mass over intervals. The empirical distribution from i.i.d. samples converges to the true distribution; this is the foundation of every statistical method. §0.3 generalises to MULTIPLE random variables and the joint, conditional, marginal structure.

References

Wasserman, L. (2004). All of Statistics. Springer. (Chapter 2 — random variables and distributions.)
Casella, G., Berger, R.L. (2002). Statistical Inference, 2nd ed. Duxbury. (Sections 1.4-1.6.)
Ross, S.M. (2014). Introduction to Probability Models, 11th ed. Academic Press. (Chapter 2.)
Billingsley, P. (1995). Probability and Measure, 3rd ed. Wiley. (For the measure-theoretic foundation; chapters 1-3.)
Feller, W. (1971). An Introduction to Probability Theory and Its Applications, Vol. 2. Wiley. (Continuous distributions, change of variable.)