Transformations of random variables

Probability from zero

Learning objectives

  • State the change-of-variable formula for monotonic transformations
  • Apply it for square, exp, shift, scale, and other common transformations
  • Recognise multi-valued inverse cases (e.g., Y = X²) and sum over pre-images
  • Use the delta method to approximate Var(g(X̄_n))
  • Recognise log-transforms as the standard fix for right-skewed positive data

Random variables are FUNCTIONS — and you can compose them with any (measurable) function g to form Y = g(X). The DISTRIBUTION of Y is determined by the distribution of X plus the structure of g. This section gives the formula and shows it in action.

The change-of-variable formula (continuous case)

If X has PDF fXf_X and Y = g(X) for a strictly MONOTONIC and differentiable g with inverse g1g^{-1}, then:

fY(y)=fX(g1(y))dg1(y)dy.f_Y(y) = f_X(g^{-1}(y)) \cdot \left| \frac{d g^{-1}(y)}{dy} \right|.

The Jacobian dg1/dy|dg^{-1}/dy| accounts for how g stretches or compresses small intervals.

Multi-valued inverses: sum over pre-images

For non-monotonic g (e.g., Y = X² maps both +√y and -√y to y), the formula extends:

fY(y)=x:g(x)=yfX(x)1g(x).f_Y(y) = \sum_{x : g(x) = y} f_X(x) \cdot \left| \frac{1}{g'(x)} \right|.

For Y = X² with fXf_X symmetric about 0: fY(y)=fX(y)/yf_Y(y) = f_X(\sqrt{y}) / \sqrt{y} for y > 0. Most importantly: the support of Y is [0, ∞), even when X is symmetric.

Linear transformations: the cleanest case

  • Shift Y = X + a: fY(y)=fX(ya)f_Y(y) = f_X(y - a). Mean shifts by a; variance unchanged.
  • Scale Y = bX (b > 0): fY(y)=fX(y/b)/bf_Y(y) = f_X(y/b) / b. Mean scales by b; variance scales by b².

For Y = aX + b: E[Y]=aE[X]+bE[Y] = a E[X] + b, Var(Y)=a2Var(X)\mathrm{Var}(Y) = a^2 \mathrm{Var}(X). These rules underpin standardisation: Z = (X - μ)/σ has mean 0, variance 1.

Common non-linear transformations

  • Y = exp(X): if X is Normal, Y is LOG-NORMAL. Used for positive, right-skewed quantities (income, prices). Log-Normal mean = exp(μ + σ²/2), not exp(μ).
  • Y = X²: if X is Normal(0, 1), Y is chi-squared with 1 degree of freedom. Used in many test statistics.
  • Y = ln(X): inverse of exp; brings right-skewed positive data closer to Normal. Standard transformation for income, GDP, fold-change data.
  • Y = sin(X): periodic. Distribution depends on input range.

The DELTA METHOD: linear approximation for transformations of estimators

If √n(X̄_n - μ) →_d N(0, σ²), and g is differentiable at μ with g'(μ) ≠ 0, then:

n(g(Xˉn)g(μ))dN(0,[g(μ)]2σ2).\sqrt{n}(g(\bar{X}_n) - g(\mu)) \xrightarrow{d} \mathcal{N}(0, [g'(\mu)]^2 \sigma^2).

So transforming an asymptotically Normal estimator gives another asymptotically Normal estimator, with variance multiplied by [g(μ)]2[g'(\mu)]^2. Used for: Wald CIs on odds ratios (g = exp), correlation Fisher z-transformation (g = tanh⁻¹), etc.

Transformations DemoInteractive figure — enable JavaScript to interact.

Try it

  • Pick Normal(0, 1) parent + Y = X². The right panel shows a chi-squared(1) distribution — heavily right-skewed with mass at 0. Even though X has E[X] = 0 and is symmetric, Y has E[Y] = E[X²] = 1.
  • Pick Normal(0, 1) parent + Y = exp(X). The right panel shows a LOG-NORMAL distribution — right-skewed, supported on [0, ∞). E[Y] = exp(0 + 1/2) = exp(0.5) ≈ 1.649 (NOT exp(0) = 1, despite E[X] = 0).
  • Pick Uniform(0, 1) + Y = X². The right panel shows the new shape — concentrated near 0 with a wedge toward 1. The empirical histogram matches the theoretical f_Y(y) = 1/(2√y).
  • Pick any parent + Y = X + 2 (shift). The right panel is the parent shifted right by 2. Same shape; just translated.
  • Pick any parent + Y = 2X (scale). The right panel is wider. Variance is multiplied by 4 (since scaling by 2 multiplies SD by 2 and variance by 2² = 4).

A biologist observes gene-expression measurements that are heavily right-skewed (a few high-expressing genes dominate). They report the arithmetic mean and SD. What ONE transformation should they consider, and what is the statistical justification? (Hint: think log-Normal.)

What you now know

Transformations are functions composed with random variables. The change-of-variable formula computes the new distribution. Multi-valued inverses (Y = X²) require summing over pre-images. The delta method handles transformations of asymptotic estimators. Log transformation is the standard fix for right-skewed positive data — it's why log-prices, log-income, and log-fold-change are everywhere in applied statistics. §0.9 introduces MGFs as a tool for handling transformations algebraically.

References

  • Casella, G., Berger, R.L. (2002). Statistical Inference, 2nd ed. (Section 2.1 — transformations.)
  • Wasserman, L. (2004). All of Statistics. Springer. (Section 5.5 — delta method.)
  • Box, G.E.P., Cox, D.R. (1964). "An analysis of transformations." JRSS-B 26(2), 211-252. (The Box-Cox parametric family of transformations.)
  • Pearl, J. (2000). Causality. Cambridge. (Transformations and causal-effect identification.)
  • Atkinson, A.C. (1985). Plots, Transformations, and Regression. Oxford. (Applied transformation choice in regression.)

This page is prerendered for SEO and accessibility. The interactive widgets above hydrate on JavaScript load.