Random Variables, Expectation, and Variance

Part 15, Chapter 15: Combinatorics and Probability

Learning objectives

Define a discrete random variable and compute its expected value $E[X] = \sum x_i p_i$
Apply linearity of expectation, even when summands are dependent
Compute variance via $\text{Var}(X) = E[X^2] - (E[X])^2$ and interpret the standard deviation
Use the mean and variance formulas for Bernoulli, binomial, and geometric distributions
Recognise that variance adds for INDEPENDENT random variables and explain why

Expected value is the centre of gravity of a random variable; variance is the spread around that centre. Together these two numbers summarise almost everything you ever need to know about a distribution, they are the first and second moments, and a huge fraction of applied probability and statistics is "use the mean and variance, and lean on the CLT for the rest." This section introduces both quantities, develops the linearity-of-expectation trick that is one of the most powerful tools in the discipline, and exhibits the variance formulas for the canonical distributions.

Random variables and expectation

A random variable $X$ is a function $X: \Omega \to \mathbb{R}$ that assigns a real number to each outcome of an experiment. For a discrete random variable taking values $x_1, x_2, \ldots$ with probabilities $p_i = P(X = x_i)$ , the expected value is

$E[X] = \sum_i x_i \, p_i$ .

It is the long-run average of $X$ over many independent trials, by the law of large numbers. For a fair die roll, $E[X] = (1 + 2 + \cdots + 6)/6 = 3.5$ , not an attainable outcome, but the long-run average per roll.

Linearity of expectation

For ANY random variables $X$ and $Y$ , even when they are dependent, and any constants $a, b, c$ :

$E[aX + bY + c] = a E[X] + b E[Y] + c$ .

This deceptively simple identity is one of the most-used tricks in combinatorics and probability. It lets you decompose a complicated random variable into a sum of indicator variables, compute the mean of each indicator (easy, it equals the probability of the event it indicates), and sum. No independence is required.

Classic application. What is the expected number of fixed points of a random permutation of $\{1, \ldots, n\}$ ? Let $X_i = 1$ i=1 if $\pi(i) = i$ , else $0$ . Then $E[X_i] = 1/n$ (any of the $n$ positions is equally likely for element $i$ ). By linearity, the expected number of fixed points is $n \cdot (1/n) = 1$ , surprisingly independent of $n$ .

Variance and standard deviation

The variance measures spread around the mean:

$\text{Var}(X) = E[(X - E[X])^2] = E[X^2] - (E[X])^2$ .

The second form is the workhorse for computation: it sidesteps subtracting the mean from every value. The standard deviation $\sigma_X = \sqrt{\text{Var}(X)}$ X=sqrttextVar(X) has the same units as $X$ and is the natural scale on which to quote spread.

Variance is NOT linear. For constants $a, c$ : $\text{Var}(aX + c) = a^2 \text{Var}(X)$ (squared scaling, ignored shifts). For INDEPENDENT random variables $X, Y$ : $\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y)$ . Without independence the cross-term $2\,\text{Cov}(X, Y)$ appears.

Canonical distributions

Bernoulli $(p)$ : $P(X = 1) = p$ , $P(X = 0) = 1 - p$ . Then $E[X] = p$ , $\text{Var}(X) = p(1 - p)$ . Variance is maximised at $p = 1/2$ (most uncertainty), and zero when $p \in \{0, 1\}$ (deterministic).

Binomial $(n, p)$ : sum of $n$ independent Bernoulli $(p)$ . So by linearity $E[X] = np$ ; by additivity of variance under independence, $\text{Var}(X) = np(1 - p)$ .

Geometric $(p)$ : number of trials up to and including the first success. $E[X] = 1/p$ , $\text{Var}(X) = (1 - p)/p^{2}$ . (A fair coin needs on average 2 flips to see the first head.)

Poisson $(\lambda)$ : $E[X] = \text{Var}(X) = \lambda$ . The equality of mean and variance is the defining structural feature of Poisson processes.

Use the grapher to sketch the binomial PMF $f(k) = \binom{n}{k}p^{k}(1-p)^{n-k}$ for fixed small $n$ (e.g., $n = 10$ ) as a function of $k$ . Notice the mass concentrates around $np$ ; the width scales as $\sqrt{np(1-p)}$ . The Central Limit Theorem (next section) makes this scaling rigorous.

Where this shows up

Finance, risk-adjusted return: The Sharpe ratio $(E[R] - r_f)/\sigma_R$ divides excess expected return by standard deviation. Portfolio optimisation in the Markowitz framework is literally minimising variance subject to a fixed expected return.

Insurance pricing: The pure premium for a policy is $E[L]$ , the expected loss. Capital reserves are sized by $\sqrt{n} \sigma_L$ L via the CLT (next section). Variance drives the cost of providing risk-pooling.

A/B testing: Sample-size formulas all come from $\sigma^{2}/n$ . To detect a 1% relative effect with 80% power you need $n \propto 1/\delta^{2}$ samples, where $\delta$ is the relative effect, pure variance arithmetic.

Machine learning, bias-variance tradeoff: Predictor error decomposes as $E[(\hat{y} - y)^{2}] = (\text{bias})^{2} + \text{variance} + \text{irreducible noise}$ . Tuning model complexity is a balance between these two competing quantities.

Quality control: Six-Sigma manufacturing targets a defect rate below $P(|X - \mu| > 6\sigma)$ (under normality, about 1 in 500 million). The whole programme is an applied-variance discipline.

Pause and think: Why does the formula $\text{Var}(X) = E[X^{2}] - (E[X])^{2}$ require $E[X^{2}] \geq (E[X])^{2}$ ? (Hint: variance is the expectation of a non-negative quantity.) This is a special case of the Cauchy-Schwarz inequality and the foundation of Jensen's inequality.

Try it

Compute $E[X]$ for a fair die. Then compute $\text{Var}(X)$ using $E[X^{2}] - (E[X])^{2}$ .

The expected number of heads in $n$ flips of a biased coin with $P(H) = p$ is $np$ . Re-derive this from linearity of expectation by writing the total as $X_1 + X_2 + \cdots + X_n$ .

You roll a fair die until you see a 6. What is the expected number of rolls? (Hint: geometric distribution with $p = 1/6$ , so $E[X] = 6$ .)

Show that for any non-negative random variable $X$ taking integer values, $E[X] = \sum_{k=1}^{\infty} P(X \geq k)$ . Use this to give a 1-line proof that geometric $(p)$ has mean $1/p$ .

If $X$ and $Y$ are independent with variances $\sigma_X^{2}$ X2 and $\sigma_Y^{2}$ Y2, find $\text{Var}(X - Y)$ . Why does the answer use a plus, not a minus?

A trap to watch for

Beginners often try to extend linearity to variance: " $\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y)$ ." This is only true under independence. In general, $\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y) + 2\,\text{Cov}(X, Y)$ . If $X$ and $Y$ are positively correlated, their sum has higher variance than the sum of their variances; if negatively correlated (a portfolio hedge!), the sum has LOWER variance, the foundation of diversification.

What you now know

You can compute means and variances of standard discrete distributions, apply linearity of expectation to decompose complicated random variables into sums of indicators, and recognise when independence lets variance add. The next section (the Central Limit Theorem) is the crown jewel of probability: it explains WHY the mean and variance summarise so much, for large sums of independent random variables, mean and variance literally determine the entire distribution.

Mark section complete →

References

Garrity, T. (2002). All the Mathematics You Missed. Cambridge University Press, ch. 15.

Ross, S. M. (2014). A First Course in Probability (9th ed.). Pearson, ch. 4-5.

Feller, W. (1968). An Introduction to Probability Theory and Its Applications, Vol. 1 (3rd ed.). Wiley, ch. 9.

Grimmett, G., Stirzaker, D. (2001). Probability and Random Processes (3rd ed.). Oxford University Press, ch. 3.

Mitzenmacher, M., Upfal, E. (2017). Probability and Computing (2nd ed.). Cambridge University Press, ch. 2-3 (linearity-of-expectation toolbox).

Keep going

Previous: Conditional Probability and Independence Next: Limit Theorems and the Central Limit Theorem

Expectations and momentsStatistics and Data Science for Researchers

Geometric SeriesFoundations of Algebra

This page is prerendered for SEO and accessibility. The interactive widgets above hydrate on JavaScript load.