The catalog of common distributions
Learning objectives
- Recognise 10 named distributions by their shape and parameter meaning
- Recall mean / variance formulas for each distribution at a glance
- Choose the right distribution for a given application (counts, durations, proportions, ...)
- Connect related distributions (Bernoulli → Binomial → Normal as n → ∞)
You will encounter the same ten named distributions across nearly every applied statistics problem. Recognising them by shape, knowing their parameters, and committing their first-two-moment formulas to memory accelerates everything that comes later — modelling decisions, derivations, intuition checks, and quick diagnostics.
Discrete distributions
- Bernoulli(p): a single 0/1 trial. P(X=1) = p. Mean p, variance p(1-p). The atom of indicator variables.
- Binomial(n, p): sum of n independent Bernoulli(p) trials. Mean np, variance np(1-p). Symmetric around np when p ≈ 0.5.
- Geometric(p): number of failures before the first success. Mean (1-p)/p, variance (1-p)/p². Memoryless property: P(X ≥ n + k | X ≥ n) = P(X ≥ k).
- Poisson(λ): counts of rare events in a fixed interval. Mean = variance = λ. The arrival-rate distribution. Limit of Binomial(n, λ/n) as n → ∞.
- Negative Binomial(r, p): failures before r successes. Mean r(1-p)/p, variance r(1-p)/p². Often the right choice for overdispersed counts (variance > mean).
Continuous distributions
- Uniform(a, b): constant density on [a, b]. Mean (a+b)/2, variance (b-a)²/12. The "neutral prior" / sampling baseline.
- Normal(μ, σ²): the Gaussian. Mean μ, variance σ². Symmetric. The central limit theorem (§0.7) makes Normal ubiquitous.
- Exponential(λ): waiting times to the first event in a Poisson process. Mean 1/λ, variance 1/λ². Memoryless on continuous time.
- Gamma(k, θ): sum of k Exponential(1/θ). Mean kθ, variance kθ². Right-skewed; commonly used for waiting times and positive-skewed continuous quantities (rainfall, insurance losses).
- Beta(α, β): bounded on [0, 1]. Mean α/(α+β), variance αβ/((α+β)²(α+β+1)). The flexible distribution for proportions; conjugate prior for Bernoulli/Binomial in Bayesian inference (§7).
How they connect
- Bernoulli is a single Binomial(1, p).
- Binomial(n, p) → Normal(np, np(1-p)) as n → ∞ (CLT) when p is bounded away from 0 and 1.
- Binomial(n, p) → Poisson(np) as n → ∞, p → 0 with np = λ constant (rare-event limit).
- Sum of k independent Exponential(λ) is Gamma(k, 1/λ).
- Beta(α, β) with α = β = 1 reduces to Uniform(0, 1).
These connections aren't curiosities — they justify many real modelling choices (Poisson for rare events, Gamma for waiting times, Beta for proportions with Bayesian-friendly conjugacy).
Try it
- Pick Bernoulli(0.3). Two stems at 0 (height 0.7) and 1 (height 0.3). Mean = 0.3. Variance = 0.21. Move p to 0.5 — now perfectly symmetric.
- Switch to Binomial(n=10, p=0.5). Symmetric peak at 5. Move p down to 0.1 — distribution becomes RIGHT-SKEWED. Crank n to 40 with p = 0.5 — note the BELL SHAPE emerging — this is the CLT in action.
- Poisson(4) vs Poisson(20). Small λ: skewed. Large λ: approximately Normal-shaped (with mean = variance = λ).
- Normal(0, 1) vs Normal(0, 2). The σ parameter controls SPREAD; the location (μ) stays at 0. Move μ to ±2 — shape unchanged, just translated.
- Beta(0.5, 0.5) vs Beta(5, 5) vs Beta(2, 8). U-shaped (α, β < 1), Bell-shaped (α = β > 1), Right-skewed (α < β). Beta is the most flexible continuous distribution on [0, 1].
An insurance company observes claim COUNTS per month with mean 4.5 and variance 12. They initially modelled with Poisson(4.5). What does the variance-to-mean ratio tell them — and which distribution from the catalog should they switch to?
What you now know
Ten named distributions cover the bulk of applied modelling. Each has a recognisable shape, a parameter meaning, and a formula for mean and variance. The connections between them (CLT, rare-event limit, Gamma as sum of Exponentials) help you choose the right one and check intuitions. Part 5 (GLMs) will use Binomial, Poisson, Normal, Gamma directly as response distributions; §7 (Bayesian methods) will use Beta as a conjugate prior. The next sections (§0.6 LLN, §0.7 CLT) explain why Normal shows up everywhere.
References
- Casella, G., Berger, R.L. (2002). Statistical Inference, 2nd ed. (Chapter 3 — common families.)
- Forbes, C., Evans, M., Hastings, N., Peacock, B. (2010). Statistical Distributions, 4th ed. Wiley. (The definitive distribution reference.)
- Johnson, N.L., Kemp, A.W., Kotz, S. (2005). Univariate Discrete Distributions, 3rd ed. Wiley. (Encyclopaedic discrete-distribution reference.)
- Johnson, N.L., Kotz, S., Balakrishnan, N. (1994). Continuous Univariate Distributions, Vol. 1, 2nd ed. Wiley.
- Wasserman, L. (2004). All of Statistics. Springer. (Quick distribution summary in appendix.)