Simulation as intuition

Probability from zero

Learning objectives

Use Monte Carlo to estimate probabilities and expectations by averaging i.i.d. samples
Apply the inverse-CDF sampling method to generate non-Uniform random variables
Recognise the 1/√n Monte Carlo error rate and use it for sample-size planning
Use simulation to verify or stress-test analytical results

Monte Carlo is the practical bridge from probability theory to computation. Given any quantity that can be expressed as E[g(X)] — a probability, an integral, an expectation — generate many i.i.d. samples and average. By LLN the mean converges; by CLT the error is approximately Normal with SD σ/√n. Monte Carlo is universal: works on multi-dimensional integrals where deterministic quadrature fails, on probability calculations with no closed form, on derived-quantity uncertainty propagation.

The Monte Carlo principle

To estimate $\theta = E[g(X)]$ , draw $X_1, X_2, \ldots, X_n$ i.i.d. from the distribution of $X$ , then:

\hat{\theta}_n = \frac{1}{n} \sum_{i=1}^{n} g(X_i).

By LLN (§0.6), $\hat{\theta}_n \to \theta$ as $n \to \infty$ . By CLT (§0.7), the error has approximate distribution $N(0, \sigma^2/n)$ where $\sigma^2 = \mathrm{Var}(g(X))$ . The 95 % CI is $\hat{\theta}_n \pm 1.96,\hat{\sigma}/\sqrt{n}$ .

Common Monte Carlo patterns

Hit-or-miss probability: estimate $P(A) \approx (1/n) \sum_i \mathbb{1}{X_i \in A}$ . Used to estimate π, integration over irregular regions, P-values.
Direct expectation: estimate $E[X] \approx (1/n) \sum X_i$ . Used in option pricing, simulation of physical systems, Bayesian posterior means.
Importance sampling: estimate $E[g(X)] = E[g(X) f(X)/q(X)]$ where $q$ is a different proposal distribution. Used when sampling from $f$ is hard or rare events dominate.

Inverse-CDF sampling

If $U \sim \text{Uniform}(0, 1)$ and $F$ is a CDF with quantile function $F^{-1}$ , then $F^{-1}(U) \sim F$ . This converts uniform random numbers (provided by every PRNG) into draws from any distribution. Practical:

Exponential: $X = -\ln(U)/\lambda$ since the Exp CDF is $F(x) = 1 - e^{-\lambda x}$ .
Box-Muller: $Z_1 = \sqrt{-2\ln U_1}\cos(2\pi U_2)$ , $Z_2 = \sqrt{-2\ln U_1}\sin(2\pi U_2)$ are independent N(0, 1).
Discrete: $X = \min{k : F(k) \ge U}$ — binary-search the CDF.

For complex distributions where $F^{-1}$ has no closed form, use REJECTION SAMPLING or MARKOV CHAIN MONTE CARLO (Part 7).

Convergence rate and sample size

Error shrinks as 1/√n. To halve the MC error, quadruple n. To achieve absolute error $\varepsilon$ : $n \approx (1.96 \sigma / \varepsilon)^2$ . For estimating a probability $p$ with absolute precision 0.01: $n \approx (1.96)^2 p(1-p) / 0.01^2 \approx 38416 \cdot p(1-p)$ . Worst-case $p = 0.5$ gives $n \approx 9604$ . Tighter precision requires quadratically more samples.

When NOT to use MC: deterministic alternatives

MC's 1/√n rate is much worse than analytical / numerical alternatives when those exist:

1-D integrals: numerical quadrature (Simpson, Gauss) converges at $O(1/n^p)$ for smooth integrands; way faster than MC.
Closed-form analytic results (Normal CDFs, Gamma functions): always preferable.

Where MC wins: high-dimensional integration, stochastic dynamics, Bayesian posteriors with intractable likelihoods, model-based decision analysis with no closed form. The 1/√n rate is dimension-INDEPENDENT — that's the killer feature.

Try it

"π by hit-or-miss" at n = 100. The estimate is around 3.0-3.4 with 95% CI roughly ±0.18. At n = 10000 the CI is ±0.02. At n = 100000 it's ±0.007. The 1/√n shrinkage is visible.
"∫_0^1 sin²(x) dx" at any n. True value is 0.5 - sin(2)/4 ≈ 0.2727. Verify CI coverage: the truth lies inside the 95% CI most of the time across different seeds.
"P(|X̄_50 - 1| < 0.2) for Exp(1)" — uses CLT to estimate a probability. Truth ≈ 0.842 (the CLT's prediction). At n = 5000, error ≈ 0.005 to 0.01.
Reseed several times for the same scenario and n. The estimate fluctuates by approximately the CI half-width, confirming that the CI is calibrated.
Push n from 5000 to 20000. The 95% CI band shrinks by factor of √(20000/5000) = 2. Visible improvement; doubles confidence.

You want a Monte Carlo estimate of a true value 0.05 (a rare-event probability) with absolute precision ±0.001. How many samples do you need? Why is rare-event simulation a particularly expensive special case of MC?

What you now know — and Part 0 closes

Monte Carlo bridges probability theory to computational practice. The 1/√n error rate is universal but slow; for hard problems (high-dim integrals, intractable posteriors) it's the only game in town. Inverse-CDF sampling provides the link from PRNG output to arbitrary distributions. Part 0 is now COMPLETE: you have the foundations of probability axioms (§0.1), random variables (§0.2), joint and conditional distributions (§0.3), moments (§0.4), the catalog of common distributions (§0.5), LLN (§0.6), CLT (§0.7), transformations (§0.8), MGFs (§0.9), and simulation (§0.10). Part 1 (Estimation) builds the inferential toolkit on these foundations.

References

Robert, C.P., Casella, G. (2004). Monte Carlo Statistical Methods, 2nd ed. Springer.
Liu, J.S. (2001). Monte Carlo Strategies in Scientific Computing. Springer.
Owen, A.B. (2013). Monte Carlo theory, methods and examples. Online: statweb.stanford.edu/~owen/mc/. (Excellent free textbook.)
Devroye, L. (1986). Non-Uniform Random Variate Generation. Springer. (Classic reference for sampling algorithms.)
Metropolis, N., Ulam, S. (1949). "The Monte Carlo method." JASA 44(247), 335-341. (The foundational paper, coined the term.)