Big-O and Asymptotic Complexity

Part 16, Chapter 16: Algorithms and Complexity

Learning objectives

Define Big-O, Big-Omega, and Big-Theta notation formally
Rank common growth rates from $O(1)$ to $O(n!)$
Distinguish worst-case, best-case, and average-case complexity
Analyse simple algorithms (loops, nested loops, recursion via the master theorem)
Recognise that constants and lower-order terms vanish in asymptotic analysis

Computational complexity is the calculus of algorithms. Just as differential calculus measures how functions change at the small scale, asymptotic analysis measures how an algorithm's resource use grows as the input size goes to infinity. The dominant resources are time (how many elementary operations) and space (how much memory). The point of the analysis is portability: an $O(n^{2})$ algorithm is slow on every machine of every era, and an $O(n \log n)$ algorithm is fast on every machine of every era, even though the absolute speeds change by factors of millions.

Big-O and friends

For functions $f, g : \mathbb{N} \to \mathbb{R}_{\geq 0}$ geq0 we write $f(n) = O(g(n))$ when there exist constants $C > 0$ and $n_0 \in \mathbb{N}$ such that

$f(n) \leq C \cdot g(n) \quad \text{for all } n \geq n_0$ .

This is an upper bound: $f$ grows AT MOST as fast as $g$ (up to a constant). The symmetric lower bound is $f(n) = \Omega(g(n))$ , meaning $f(n) \geq C \cdot g(n)$ for all large $n$ . Combining both gives the tight bound $f(n) = \Theta(g(n))$ , same growth rate up to constants.

The strict inequalities are the little-o and little-omega: $f(n) = o(g(n))$ when $f(n)/g(n) \to 0$ (strictly slower) and $f(n) = \omega(g(n))$ when $f(n)/g(n) \to \infty$ (strictly faster).

The asymptotic-growth ladder

The standard complexity classes form a chain, each growing dramatically faster than the previous:

$O(1)$ , constant. Array indexing, hash-table lookup (amortised).
$O(\log n)$ , logarithmic. Binary search. Each step halves the search space.
$O(\sqrt{n})$ , sub-linear, polynomial. Trial division up to $\sqrt{n}$ to test primality.
$O(n)$ , linear. A single pass through input. Optimal for problems that must read everything.
$O(n \log n)$ , linearithmic. Comparison-based sorting (merge sort, heap sort); FFT.
$O(n^{2})$ , quadratic. Two nested linear loops. Manageable up to $n \approx 10^{4}$ ; painful above.
$O(n^{3})$ , cubic. Three nested loops. Naive matrix multiplication. Crashes through reasonable runtime around $n \approx 10^{3}$ .
$O(2^{n})$ , exponential. Brute-force subset enumeration. Useless past $n \approx 30$ .
$O(n!)$ , factorial. Brute-force permutation enumeration. Useless past $n \approx 12$ .

The gap between $O(n^{k})$ and $O(2^{n})$ is the central divide of complexity theory: anything in the former family is "tractable" in a robust, machine-independent sense; anything that genuinely lives in the latter (and only the latter) is considered intractable. The P-vs-NP question of Section 16.4 asks exactly which side of this gap many natural problems live on.

Worst, best, and average case

For a fixed algorithm $A$ and input size $n$ , there are many inputs. The complexity depends on which one we pick:

Worst case: maximum over all inputs of size $n$ . The standard, conservative measure.
Best case: minimum. Often trivially fast and not very informative.
Average case: expected value under some probability distribution on inputs. Quicksort's average is $O(n \log n)$ even though its worst case is $O(n^{2})$ .

Quicksort's gap between worst and average is the canonical example of why average-case analysis matters: in practice you almost never hit the worst case, and the algorithm is one of the fastest sorts known.

Recurrences and the master theorem

Many divide-and-conquer algorithms satisfy recurrences of the form $T(n) = a T(n/b) + f(n)$ , $a$ subproblems each of size $n/b$ plus a per-call overhead $f(n)$ . The master theorem gives a clean answer:

If $f(n) = O(n^{c})$ with $c < \log_{b}a$ ba, then $T(n) = \Theta(n^{\log_{b}a})$ ba).
If $f(n) = \Theta(n^{c})$ with $c = \log_{b}a$ ba, then $T(n) = \Theta(n^{c} \log n)$ .
If $f(n) = \Omega(n^{c})$ with $c > \log_{b}a$ ba (and a regularity condition), then $T(n) = \Theta(f(n))$ .

Merge sort: $T(n) = 2 T(n/2) + O(n)$ falls in case 2 with $c = 1, \log_{b}a = 1$ ba=1, giving $\Theta(n \log n)$ . Binary search: $T(n) = T(n/2) + O(1)$ falls in case 2 with $c = 0, \log_{b}a = 0$ ba=0, giving $\Theta(\log n)$ .

Loosely, picture the complexity hierarchy as nested classes: $P \subseteq NP \subseteq PSPACE \subseteq EXPTIME$ . Each inclusion is either proven or widely believed strict; the most famous unresolved question is whether $P = NP$ . (See Section 16.4 for the formal definitions.)

Where this shows up

Database index choice: Search on a B-tree-indexed column is $O(\log n)$ ; on an unindexed column it is $O(n)$ . For a 1-billion-row table the difference is between 30 disk reads and 1 billion, a factor of $10^{8}$ . Almost every production database performance issue is an index-vs-table-scan complexity gap.
Machine-learning training time: Training a neural network with $n$ parameters on $m$ examples is $O(n m)$ per epoch; the full training is $O(n m \cdot \text{epochs})$ . Modern LLMs have $n \sim 10^{11}$ and $m \sim 10^{13}$ , so the $10^{24}$ FLOPs of training are very much an asymptotic-analysis question first and an engineering question second.
Streaming algorithms: When data is too large to store, you need algorithms whose space complexity is $O(\log n)$ or $O(\text{polylog}(n))$ . Count-Min Sketch, HyperLogLog, and Bloom filters are the canonical examples, standard tooling at Google, Meta, and every other large-data company.
Cryptographic key sizes: RSA-2048 means a 2048-bit modulus; brute-forcing factoring is conjectured to be $2^{O(\sqrt{n} (\log n)^{2/3})}$ time (the number-field sieve). Doubling the key length roughly cubes the cracking time, a complexity-class-driven security calculus.
Compilers and big-O guarantees: Modern JIT compilers (V8, JVM) instrument code at runtime and re-optimise hot loops. Their effectiveness depends on the asymptotic behaviour of the algorithm being right; constant-factor improvements compose, but complexity-class regressions are devastating.

Pause and think: If an algorithm with running time $T(n) = 3n^{2} + 50n + 1000$ is asymptotically $O(n^{2})$ , why do we drop the $50n$ and $1000$ terms? Why is even the $3$ irrelevant for asymptotic analysis? (Hint: think about what happens for $n = 10^{6}$ .)

Try it

Show formally that $5n^{2} + 100n + 17 = O(n^{2})$ . Find explicit constants $C$ and $n_0$ .
Rank in growth order: $n^{2}, ;n^{2.5}, ;n^{\log n}, ;2^{n}, ;n!, ;2^{\sqrt{n}}, ;n$ . (Answer: $n < n^{2} < n^{2.5} < n^{\log n} < 2^{\sqrt{n}} < 2^{n} < n!$ .)
Use the master theorem on $T(n) = 3 T(n/2) + O(n)$ . (Answer: $\log_{2}3 \approx 1.585 > 1$ 23approx1.585>1, so case 1: $T(n) = \Theta(n^{\log_{2}3})$ 23). This is Karatsuba multiplication.)
You have an algorithm that runs in 1 second on input of size $n = 1000$ at $\Theta(n^{2})$ . How long will it take on input of size $n = 10^{6}$ ?
True or false: $n \log n = O(n^{2})$ . $n^{2} = O(n \log n)$ . Justify each.

A trap to watch for

Big-O is an UPPER bound, not an equality. Writing $n = O(n^{2})$ is technically correct but loses information, it overstates the algorithm's growth. Use $\Theta$ when you mean "this is the tight growth rate." Conversely, $T(n) = O(n)$ does NOT mean $T(n)$ is exactly linear; it could be $\sqrt{n}$ or even $\log n$ . Big-O is a one-sided inequality dressed in equality's clothing. The notation is widely (mis)used in practice; in formal papers, prefer $T(n) \in O(g(n))$ as a set-membership statement.

What you now know

You can read and write Big-O / $\Theta$ / $\Omega$ statements rigorously, rank common growth rates, distinguish worst- from average-case, and analyse divide-and-conquer recurrences via the master theorem. The next section applies this toolkit to graph algorithms (BFS, DFS, Dijkstra); after that we look at sorting and finally tackle the P-vs-NP question that organises the entire field.

Mark section complete →

References

Garrity, T. (2002). All the Mathematics You Missed. Cambridge University Press, ch. 16.
Cormen, T. H., Leiserson, C. E., Rivest, R. L., Stein, C. (2009). Introduction to Algorithms (3rd ed.). MIT Press, ch. 3-4.
Sipser, M. (2012). Introduction to the Theory of Computation (3rd ed.). Cengage, ch. 7.
Sedgewick, R., Wayne, K. (2011). Algorithms (4th ed.). Addison-Wesley, ch. 1.4.
Knuth, D. E. (1997). The Art of Computer Programming, Vol. 1 (3rd ed.). Addison-Wesley, Section 1.2 (asymptotic notation).