The catalog of common distributions

Probability from zero

Learning objectives

Recognise 10 named distributions by their shape and parameter meaning
Recall mean / variance formulas for each distribution at a glance
Choose the right distribution for a given application (counts, durations, proportions, ...)
Connect related distributions (Bernoulli → Binomial → Normal as n → ∞)

You will encounter the same ten named distributions across nearly every applied statistics problem. Recognising them by shape, knowing their parameters, and committing their first-two-moment formulas to memory accelerates everything that comes later — modelling decisions, derivations, intuition checks, and quick diagnostics.

Discrete distributions

Bernoulli(p): a single 0/1 trial. P(X=1) = p. Mean p, variance p(1-p). The atom of indicator variables.
Binomial(n, p): sum of n independent Bernoulli(p) trials. Mean np, variance np(1-p). Symmetric around np when p ≈ 0.5.
Geometric(p): number of failures before the first success. Mean (1-p)/p, variance (1-p)/p². Memoryless property: P(X ≥ n + k | X ≥ n) = P(X ≥ k).
Poisson(λ): counts of rare events in a fixed interval. Mean = variance = λ. The arrival-rate distribution. Limit of Binomial(n, λ/n) as n → ∞.
Negative Binomial(r, p): failures before r successes. Mean r(1-p)/p, variance r(1-p)/p². Often the right choice for overdispersed counts (variance > mean).

Continuous distributions

Uniform(a, b): constant density on [a, b]. Mean (a+b)/2, variance (b-a)²/12. The "neutral prior" / sampling baseline.
Normal(μ, σ²): the Gaussian. Mean μ, variance σ². Symmetric. The central limit theorem (§0.7) makes Normal ubiquitous.
Exponential(λ): waiting times to the first event in a Poisson process. Mean 1/λ, variance 1/λ². Memoryless on continuous time.
Gamma(k, θ): sum of k Exponential(1/θ). Mean kθ, variance kθ². Right-skewed; commonly used for waiting times and positive-skewed continuous quantities (rainfall, insurance losses).
Beta(α, β): bounded on [0, 1]. Mean α/(α+β), variance αβ/((α+β)²(α+β+1)). The flexible distribution for proportions; conjugate prior for Bernoulli/Binomial in Bayesian inference (§7).

How they connect

Bernoulli is a single Binomial(1, p).
Binomial(n, p) → Normal(np, np(1-p)) as n → ∞ (CLT) when p is bounded away from 0 and 1.
Binomial(n, p) → Poisson(np) as n → ∞, p → 0 with np = λ constant (rare-event limit).
Sum of k independent Exponential(λ) is Gamma(k, 1/λ).
Beta(α, β) with α = β = 1 reduces to Uniform(0, 1).

These connections aren't curiosities — they justify many real modelling choices (Poisson for rare events, Gamma for waiting times, Beta for proportions with Bayesian-friendly conjugacy).

Try it

Pick Bernoulli(0.3). Two stems at 0 (height 0.7) and 1 (height 0.3). Mean = 0.3. Variance = 0.21. Move p to 0.5 — now perfectly symmetric.
Switch to Binomial(n=10, p=0.5). Symmetric peak at 5. Move p down to 0.1 — distribution becomes RIGHT-SKEWED. Crank n to 40 with p = 0.5 — note the BELL SHAPE emerging — this is the CLT in action.
Poisson(4) vs Poisson(20). Small λ: skewed. Large λ: approximately Normal-shaped (with mean = variance = λ).
Normal(0, 1) vs Normal(0, 2). The σ parameter controls SPREAD; the location (μ) stays at 0. Move μ to ±2 — shape unchanged, just translated.
Beta(0.5, 0.5) vs Beta(5, 5) vs Beta(2, 8). U-shaped (α, β < 1), Bell-shaped (α = β > 1), Right-skewed (α < β). Beta is the most flexible continuous distribution on [0, 1].

An insurance company observes claim COUNTS per month with mean 4.5 and variance 12. They initially modelled with Poisson(4.5). What does the variance-to-mean ratio tell them — and which distribution from the catalog should they switch to?

What you now know

Ten named distributions cover the bulk of applied modelling. Each has a recognisable shape, a parameter meaning, and a formula for mean and variance. The connections between them (CLT, rare-event limit, Gamma as sum of Exponentials) help you choose the right one and check intuitions. Part 5 (GLMs) will use Binomial, Poisson, Normal, Gamma directly as response distributions; §7 (Bayesian methods) will use Beta as a conjugate prior. The next sections (§0.6 LLN, §0.7 CLT) explain why Normal shows up everywhere.

References

Casella, G., Berger, R.L. (2002). Statistical Inference, 2nd ed. (Chapter 3 — common families.)
Forbes, C., Evans, M., Hastings, N., Peacock, B. (2010). Statistical Distributions, 4th ed. Wiley. (The definitive distribution reference.)
Johnson, N.L., Kemp, A.W., Kotz, S. (2005). Univariate Discrete Distributions, 3rd ed. Wiley. (Encyclopaedic discrete-distribution reference.)
Johnson, N.L., Kotz, S., Balakrishnan, N. (1994). Continuous Univariate Distributions, Vol. 1, 2nd ed. Wiley.
Wasserman, L. (2004). All of Statistics. Springer. (Quick distribution summary in appendix.)