Probability: Sample Spaces and Events

Part 15, Chapter 15: Combinatorics and Probability

Learning objectives

Define a probability space and state Kolmogorov's three axioms
Compute probabilities of compound events using equally likely outcomes
Apply the complement rule and inclusion-exclusion for events
Use the union bound for upper-bounding probabilities of unions
Distinguish disjoint events from independent events (almost opposite)

Probability is the calculus of uncertainty. When the world refuses to be deterministic, the next dice roll, the next click, the next packet drop, we still want quantitative predictions. Probability supplies them by reducing "uncertainty" to a small number of consistent rules: outcomes, events, and a measure on those events that obeys three axioms. Once those axioms are in place, every formula in the rest of the chapter (conditional probability, expected value, the CLT) is a logical consequence.

The probability space $(\Omega, \mathcal{F}, P)$

A sample space $\Omega$ is the set of all possible outcomes of an experiment. An event is any subset of $\Omega$ ; the collection of allowed events is a $\sigma$ -algebra $\mathcal{F}$ . A probability measure is a function $P: \mathcal{F} \to [0, 1]$ satisfying Kolmogorov's axioms:

(K1) $P(A) \geq 0$ for every event $A$ .
(K2) $P(\Omega) = 1$ , something certain happens.
(K3) For any countable sequence of disjoint events $A_1, A_2, \ldots$ , $P\!\left(\bigcup_i A_i\right) = \sum_i P(A_i)$ . (Countable additivity.)

From these alone you can derive every elementary identity. For example, since $\Omega = A \cup A^{c}$ with $A$ and $A^{c}$ disjoint, K2 plus K3 give the complement rule: $P(A^{c}) = 1 - P(A)$ . Similarly $P(\emptyset) = 0$ .

Equally likely outcomes

When $\Omega$ is finite and every outcome is equally likely (fair coins, dice, random card draws), the probability of an event reduces to a counting ratio: $P(A) = \frac{|A|}{|\Omega|}$ . This is why Section 15.1 came first, classical probability is just counting under a uniform distribution. Most introductory problems live here.

Inclusion-exclusion for probabilities

For overlapping events, the counting identity transfers directly: $P(A \cup B) = P(A) + P(B) - P(A \cap B)$ . The corrective $P(A \cap B)$ term removes the double-count. For three events the alternating sum extends: $P(A \cup B \cup C) = P(A) + P(B) + P(C) - P(A \cap B) - P(A \cap C) - P(B \cap C) + P(A \cap B \cap C)$ .

The union bound drops the corrective terms and gives the inequality $P(A \cup B) \leq P(A) + P(B)$ . This is loose but indispensable: it converts hard intersection probabilities into easy sums and powers analysis of every randomised algorithm, every concentration inequality, and (e.g.) the entire theory of statistical learning generalisation bounds.

The Venn-diagram widget above lets you set up two overlapping events and read off $P(A)$ , $P(B)$ , $P(A \cap B)$ , and $P(A \cup B)$ . Adjust the overlap and watch the inclusion-exclusion identity stay invariant.

Where this shows up

Reliability engineering: The probability that a system with $n$ components fails is at most the sum of the individual failure probabilities, literally the union bound. Aircraft, nuclear-plant, and spacecraft reliability are designed around this inequality with redundant subsystems engineered so each one's individual failure probability is $10^{-7}$ per hour or less.

Machine learning generalisation: A hypothesis class with $H$ functions has training-vs-test gap bounded by $\sqrt{\log(H)/n}$ via the union bound across hypotheses. This is the foundational PAC-learning result.

Networking, packet loss bounds: The probability that ANY of $k$ packets is lost is at most $k \cdot p$ where $p$ is the per-packet loss probability. Used to provision congestion windows in TCP and to size redundancy in QUIC and erasure-coded storage.

Insurance and actuarial science: Premium calculations begin from the probability that a covered event occurs in the policy term; reinsurance treaties use inclusion-exclusion to price overlapping risks across multiple insured parties.

A/B testing: Multiple-comparison corrections (Bonferroni) explicitly use the union bound: if you run $k$ simultaneous tests at significance $\alpha/k$ , the family-wise error rate stays below $\alpha$ . Without the correction, false positives multiply.

Pause and think: Suppose $P(A) = 0.6$ and $P(B) = 0.7$ . What is the LARGEST possible value of $P(A \cup B)$ ? The SMALLEST? (Hint: think about the most and least possible overlap. Note $P(A \cap B) \geq P(A) + P(B) - 1$ since $P(A \cup B) \leq 1$ .)

Try it

Predict first: when rolling two fair dice, what is $P(\text{sum} = 7)$ ? $P(\text{sum} = 12)$ ? Why is one much larger?

A bag has 4 red and 6 blue balls. Draw two without replacement. Compute $P(\text{both red})$ using the multiplication principle on conditional probabilities.

Use the complement to compute $P(\text{at least one head in 5 fair tosses})$ . Why is direct enumeration of "at least one" almost always inefficient?

If $P(A) = 0.3$ and $P(B) = 0.5$ and $A \cap B = \emptyset$ , what is $P(A \cup B)$ ? Why can disjoint events have positive individual probabilities yet be maximally dependent?

Show that $P(A) + P(B) - 1 \leq P(A \cap B) \leq \min(P(A), P(B))$ . These are the Fréchet bounds on the intersection.

A trap to watch for

Beginners regularly confuse "disjoint" with "independent" because both involve two events related in some way. They are almost opposite. Disjoint means $A \cap B = \emptyset$ , so $P(A \cap B) = 0$ , if $A$ happens, $B$ cannot. Independent (formal definition in Section 15.3) means $P(A \cap B) = P(A) P(B)$ , the two events have no informational link. For events both with positive probability, disjointness is the MAXIMALLY DEPENDENT case (knowing one rules the other out), while independence is the no-information case. They coincide only in the trivial case where one of the events has probability zero.

What you now know

You can build a probability space from a sample space and a measure, compute event probabilities by counting under uniformity, manipulate compound events with inclusion-exclusion, and bound probabilities of unions with the union bound. The next section deepens this with conditional probability and independence, the tools that let you update beliefs as evidence arrives and that power every Bayesian inference in machine learning.

Mark section complete →

References

Garrity, T. (2002). All the Mathematics You Missed. Cambridge University Press, ch. 15.

Ross, S. M. (2014). A First Course in Probability (9th ed.). Pearson, ch. 2.

Feller, W. (1968). An Introduction to Probability Theory and Its Applications, Vol. 1 (3rd ed.). Wiley, ch. 1.

Grimmett, G., Stirzaker, D. (2001). Probability and Random Processes (3rd ed.). Oxford University Press, ch. 1.

Durrett, R. (2019). Probability: Theory and Examples (5th ed.). Cambridge University Press, ch. 1.

Keep going

Previous: Counting Principles and Combinatorial Identities Next: Conditional Probability and Independence

Open Sets, Closed Sets, and Topological SpacesA Mathematician's Survey

Operations on SetsIntroduction to Mathematical Proof

This page is prerendered for SEO and accessibility. The interactive widgets above hydrate on JavaScript load.

Learning objectives

The probability space (Omega,mathcalF,P)(\Omega, \mathcal{F}, P)(Omega,mathcalF,P)

Equally likely outcomes

Inclusion-exclusion for probabilities

Try it

A trap to watch for

What you now know

References

The probability space $(\Omega, \mathcal{F}, P)$