Moment-generating and characteristic functions

Probability from zero

Learning objectives

  • Define the MGF M_X(t) = E[e^(tX)] and recognise its domain of definition
  • Use the Taylor expansion to extract moments: M^(k)(0) = E[X^k]
  • Apply MGF UNIQUENESS to identify distributions and prove CLT
  • Recognise the characteristic function φ_X(t) = E[e^(itX)] as a complex-valued, always-defined variant
  • Use MGFs to compute sums of independent random variables

Moment-generating functions are a TRANSFORM of the distribution that encodes ALL moments in a single function. They are a powerful theoretical tool — for proving distributional equality, summing independent variables, and deriving the CLT.

Definition

For a random variable X, the MGF is:

MX(t)=E[etX]=xetxpX(x) (discrete)=etxfX(x)dx (continuous)M_X(t) = E[e^{tX}] = \sum_x e^{tx} p_X(x) \text{ (discrete)} = \int e^{tx} f_X(x),dx \text{ (continuous)}.

M_X(t) is defined where the integral / sum CONVERGES. It always equals 1 at t = 0 and is convex when finite. For some distributions (Cauchy) the MGF doesn't exist anywhere except t = 0.

Extracting moments

The Taylor series of e^{tX} around t = 0:

MX(t)=E[1+tX+(tX)22!+]=k=0E[Xk]k!tk.M_X(t) = E[1 + tX + \tfrac{(tX)^2}{2!} + \ldots] = \sum_{k=0}^{\infty} \frac{E[X^k]}{k!} t^k.

So M_X(t) is the moment GENERATING function — each moment E[Xk]E[X^k] is the coefficient of tk/k!t^k / k!, equivalently the kk-th derivative at 0:

MX(k)(0)=E[Xk].M_X^{(k)}(0) = E[X^k].

Practical: take the derivative of M(t), plug in t = 0, get the mean. Take the second derivative, get E[X²]. Variance = M''(0) - (M'(0))².

Common MGFs

  • Bernoulli(p): M(t)=1p+petM(t) = 1 - p + pe^t.
  • Poisson(λ): M(t)=exp(λ(et1))M(t) = \exp(\lambda(e^t - 1)).
  • Normal(μ, σ²): M(t)=exp(μt+12σ2t2)M(t) = \exp(\mu t + \tfrac{1}{2}\sigma^2 t^2).
  • Exponential(λ): M(t)=λ/(λt)M(t) = \lambda / (\lambda - t) for t < λ.
  • Gamma(k, θ): M(t)=(1θt)kM(t) = (1 - \theta t)^{-k} for t < 1/θ.

UNIQUENESS — the foundation of CLT and limit theorems

If M_X(t) and M_Y(t) exist in a neighbourhood of 0 and agree there, then X and Y have the SAME distribution. This makes the MGF a unique fingerprint. The proof of the CLT — that X̄_n becomes Normal — goes through showing the MGF of X̄_n converges to the Normal MGF.

Sums of independent random variables

If X and Y are INDEPENDENT, then MX+Y(t)=MX(t)MY(t)M_{X+Y}(t) = M_X(t) M_Y(t). This is the multiplicative property that makes MGFs perfect for sums:

  • Sum of independent Bernoullis(p) is Binomial(n, p) — easy to verify by multiplying MGFs.
  • Sum of independent Poissons is Poisson — verified by MGF product (e^{λ(e^t-1)} · e^{μ(e^t-1)} = e^{(λ+μ)(e^t-1)}).
  • Sum of independent Gammas with same θ is Gamma — same trick.
  • Sum of independent Normals is Normal — multiplication of exponentials in t.

Characteristic functions: the bulletproof variant

For some distributions (e.g., Cauchy), M_X(t) doesn't exist anywhere except t = 0. The CHARACTERISTIC FUNCTION:

φX(t)=E[eitX]\varphi_X(t) = E[e^{itX}]

uses imaginary i. Since eitX=1|e^{itX}| = 1 for real t, the integral always converges. Every distribution has a CF; the CF is the FOURIER TRANSFORM of the PDF; CFs uniquely identify distributions; the CLT's formal proof uses CFs (Lévy continuity theorem) rather than MGFs to handle distributions like Cauchy.

Mgf VisualizerInteractive figure — enable JavaScript to interact.

Try it

  • Normal(0, 1). The MGF is exp(t²/2). At t = 0: M = 1 (red dot). The tangent line (slope = E[X] = 0) is horizontal. Verify: M'(0) = 0, M''(0) = 1; variance = 1 - 0² = 1.
  • Normal(2, 1). The MGF is exp(2t + t²/2). At t = 0: M = 1, slope = 2 (tangent line points up). Verify the readouts: analytical and numerical E[X] = 2 should agree.
  • Bernoulli(0.5). M(t) = 0.5 + 0.5·e^t. Mean = 0.5, variance = 0.25. The MGF crosses M = 1 at t = 0; slope there is 0.5.
  • Exponential(1). M(t) = 1/(1-t) for t < 1. The MGF EXPLODES as t → 1 (vertical asymptote). Defined only for t < λ. This is why heavy-tailed distributions (Cauchy) don't have MGFs — the integral diverges for any t ≠ 0.
  • Compare numerical vs analytical E[X] and Var(X) in the readout. They should match to within ~0.001 since numerical second differences are accurate to O(ε²).

If you want to show that the sum of n independent Poisson(λ_i) variables is Poisson(Σλ_i), how would you use the MGF property — and why is it cleaner than directly convolving PMFs n times?

What you now know

MGFs encode ALL moments in one function and uniquely identify distributions. Their multiplicative property under independence makes them perfect for sums. The characteristic function is the bulletproof complex-valued variant that always exists. These tools are theoretical engines: the CLT, the Lindeberg-Feller theorem, the convergence theorems for empirical processes — all built on MGF or CF technology. §0.10 closes Part 0 with simulation, which uses the foundations of Parts 0.1-0.9 to estimate probabilities and expectations computationally.

References

  • Casella, G., Berger, R.L. (2002). Statistical Inference, 2nd ed. (Section 2.3 — MGFs.)
  • Feller, W. (1971). An Introduction to Probability Theory and Its Applications, Vol. 2. Wiley. (Chapter XV — characteristic functions and CLT proofs.)
  • Lévy, P. (1925). Calcul des probabilités. Gauthier-Villars. (The continuity theorem.)
  • Billingsley, P. (1995). Probability and Measure, 3rd ed. Wiley. (Chapter 26 — CLT via characteristic functions.)
  • Lukacs, E. (1970). Characteristic Functions, 2nd ed. Griffin. (Comprehensive reference.)

This page is prerendered for SEO and accessibility. The interactive widgets above hydrate on JavaScript load.