Big-O and Asymptotic Complexity

Part 16, Chapter 16: Algorithms and Complexity

Learning objectives

  • Define Big-O, Big-Omega, and Big-Theta notation formally
  • Rank common growth rates from O(1)O(1) to O(n!)O(n!)
  • Distinguish worst-case, best-case, and average-case complexity
  • Analyse simple algorithms (loops, nested loops, recursion via the master theorem)
  • Recognise that constants and lower-order terms vanish in asymptotic analysis

Computational complexity is the calculus of algorithms. Just as differential calculus measures how functions change at the small scale, asymptotic analysis measures how an algorithm's resource use grows as the input size goes to infinity. The dominant resources are time (how many elementary operations) and space (how much memory). The point of the analysis is portability: an O(n2)O(n^{2}) algorithm is slow on every machine of every era, and an O(nlogn)O(n \log n) algorithm is fast on every machine of every era, even though the absolute speeds change by factors of millions.

Big-O and friends

For functions f,g:mathbbNtomathbbRgeq0f, g : \mathbb{N} \to \mathbb{R}_{\geq 0}geq0 we write f(n)=O(g(n))f(n) = O(g(n)) when there exist constants C>0C > 0 and n_0inmathbbNn_0 \in \mathbb{N} such that

f(n)leqCcdotg(n)quadtextforallngeqn_0f(n) \leq C \cdot g(n) \quad \text{for all } n \geq n_0.

This is an upper bound: ff grows AT MOST as fast as gg (up to a constant). The symmetric lower bound is f(n)=Omega(g(n))f(n) = \Omega(g(n)), meaning f(n)geqCcdotg(n)f(n) \geq C \cdot g(n) for all large nn. Combining both gives the tight bound f(n)=Theta(g(n))f(n) = \Theta(g(n)), same growth rate up to constants.

The strict inequalities are the little-o and little-omega: f(n)=o(g(n))f(n) = o(g(n)) when f(n)/g(n)to0f(n)/g(n) \to 0 (strictly slower) and f(n)=omega(g(n))f(n) = \omega(g(n)) when f(n)/g(n)toinftyf(n)/g(n) \to \infty (strictly faster).

The asymptotic-growth ladder

The standard complexity classes form a chain, each growing dramatically faster than the previous:

  • O(1)O(1), constant. Array indexing, hash-table lookup (amortised).
  • O(logn)O(\log n), logarithmic. Binary search. Each step halves the search space.
  • O(sqrtn)O(\sqrt{n}), sub-linear, polynomial. Trial division up to sqrtn\sqrt{n} to test primality.
  • O(n)O(n), linear. A single pass through input. Optimal for problems that must read everything.
  • O(nlogn)O(n \log n), linearithmic. Comparison-based sorting (merge sort, heap sort); FFT.
  • O(n2)O(n^{2}), quadratic. Two nested linear loops. Manageable up to napprox104n \approx 10^{4}; painful above.
  • O(n3)O(n^{3}), cubic. Three nested loops. Naive matrix multiplication. Crashes through reasonable runtime around napprox103n \approx 10^{3}.
  • O(2n)O(2^{n}), exponential. Brute-force subset enumeration. Useless past napprox30n \approx 30.
  • O(n!)O(n!), factorial. Brute-force permutation enumeration. Useless past napprox12n \approx 12.

The gap between O(nk)O(n^{k}) and O(2n)O(2^{n}) is the central divide of complexity theory: anything in the former family is "tractable" in a robust, machine-independent sense; anything that genuinely lives in the latter (and only the latter) is considered intractable. The P-vs-NP question of §16.4 asks exactly which side of this gap many natural problems live on.

Worst, best, and average case

For a fixed algorithm AA and input size nn, there are many inputs. The complexity depends on which one we pick:

  • Worst case: maximum over all inputs of size nn. The standard, conservative measure.
  • Best case: minimum. Often trivially fast and not very informative.
  • Average case: expected value under some probability distribution on inputs. Quicksort's average is O(nlogn)O(n \log n) even though its worst case is O(n2)O(n^{2}).

Quicksort's gap between worst and average is the canonical example of why average-case analysis matters: in practice you almost never hit the worst case, and the algorithm is one of the fastest sorts known.

Recurrences and the master theorem

Many divide-and-conquer algorithms satisfy recurrences of the form T(n)=aT(n/b)+f(n)T(n) = a T(n/b) + f(n), aa subproblems each of size n/bn/b plus a per-call overhead f(n)f(n). The master theorem gives a clean answer:

  • If f(n)=O(nc)f(n) = O(n^{c}) with c<logbac < \log_{b}aba, then T(n)=Theta(nlogba)T(n) = \Theta(n^{\log_{b}a})ba).
  • If f(n)=Theta(nc)f(n) = \Theta(n^{c}) with c=logbac = \log_{b}aba, then T(n)=Theta(nclogn)T(n) = \Theta(n^{c} \log n).
  • If f(n)=Omega(nc)f(n) = \Omega(n^{c}) with c>logbac > \log_{b}aba (and a regularity condition), then T(n)=Theta(f(n))T(n) = \Theta(f(n)).

    Merge sort: T(n)=2T(n/2)+O(n)T(n) = 2 T(n/2) + O(n) falls in case 2 with c=1,logba=1c = 1, \log_{b}a = 1ba=1, giving Theta(nlogn)\Theta(n \log n). Binary search: T(n)=T(n/2)+O(1)T(n) = T(n/2) + O(1) falls in case 2 with c=0,logba=0c = 0, \log_{b}a = 0ba=0, giving Theta(logn)\Theta(\log n).

    Loosely, picture the complexity hierarchy as nested classes: PsubseteqNPsubseteqPSPACEsubseteqEXPTIMEP \subseteq NP \subseteq PSPACE \subseteq EXPTIME. Each inclusion is either proven or widely believed strict; the most famous unresolved question is whether P=NPP = NP. (See §16.4 for the formal definitions.)

    Where this shows up
    • Database index choice: Search on a B-tree-indexed column is O(logn)O(\log n); on an unindexed column it is O(n)O(n). For a 1-billion-row table the difference is between 30 disk reads and 1 billion, a factor of 10810^{8}. Almost every production database performance issue is an index-vs-table-scan complexity gap.
    • Machine-learning training time: Training a neural network with nn parameters on mm examples is O(nm)O(n m) per epoch; the full training is O(nmcdottextepochs)O(n m \cdot \text{epochs}). Modern LLMs have nsim1011n \sim 10^{11} and msim1013m \sim 10^{13}, so the 102410^{24} FLOPs of training are very much an asymptotic-analysis question first and an engineering question second.
    • Streaming algorithms: When data is too large to store, you need algorithms whose space complexity is O(logn)O(\log n) or O(textpolylog(n))O(\text{polylog}(n)). Count-Min Sketch, HyperLogLog, and Bloom filters are the canonical examples, standard tooling at Google, Meta, and every other large-data company.
    • Cryptographic key sizes: RSA-2048 means a 2048-bit modulus; brute-forcing factoring is conjectured to be 2O(sqrtn(logn)2/3)2^{O(\sqrt{n} (\log n)^{2/3})} time (the number-field sieve). Doubling the key length roughly cubes the cracking time, a complexity-class-driven security calculus.
    • Compilers and big-O guarantees: Modern JIT compilers (V8, JVM) instrument code at runtime and re-optimise hot loops. Their effectiveness depends on the asymptotic behaviour of the algorithm being right; constant-factor improvements compose, but complexity-class regressions are devastating.

    Pause and think: If an algorithm with running time T(n)=3n2+50n+1000T(n) = 3n^{2} + 50n + 1000 is asymptotically O(n2)O(n^{2}), why do we drop the 50n50n and 10001000 terms? Why is even the 33 irrelevant for asymptotic analysis? (Hint: think about what happens for n=106n = 10^{6}.)

    Try it

    • Show formally that 5n2+100n+17=O(n2)5n^{2} + 100n + 17 = O(n^{2}). Find explicit constants CC and n_0n_0.
    • Rank in growth order: n2,;n2.5,;nlogn,;2n,;n!,;2sqrtn,;nn^{2}, ;n^{2.5}, ;n^{\log n}, ;2^{n}, ;n!, ;2^{\sqrt{n}}, ;n. (Answer: n<n2<n2.5<nlogn<2sqrtn<2n<n!n < n^{2} < n^{2.5} < n^{\log n} < 2^{\sqrt{n}} < 2^{n} < n!.)
    • Use the master theorem on T(n)=3T(n/2)+O(n)T(n) = 3 T(n/2) + O(n). (Answer: log23approx1.585>1\log_{2}3 \approx 1.585 > 123approx1.585>1, so case 1: T(n)=Theta(nlog23)T(n) = \Theta(n^{\log_{2}3})23). This is Karatsuba multiplication.)
    • You have an algorithm that runs in 1 second on input of size n=1000n = 1000 at Theta(n2)\Theta(n^{2}). How long will it take on input of size n=106n = 10^{6}?
    • True or false: nlogn=O(n2)n \log n = O(n^{2}). n2=O(nlogn)n^{2} = O(n \log n). Justify each.
    • A trap to watch for

      Big-O is an UPPER bound, not an equality. Writing n=O(n2)n = O(n^{2}) is technically correct but loses information, it overstates the algorithm's growth. Use Theta\Theta when you mean "this is the tight growth rate." Conversely, T(n)=O(n)T(n) = O(n) does NOT mean T(n)T(n) is exactly linear; it could be sqrtn\sqrt{n} or even logn\log n. Big-O is a one-sided inequality dressed in equality's clothing. The notation is widely (mis)used in practice; in formal papers, prefer T(n)inO(g(n))T(n) \in O(g(n)) as a set-membership statement.

      What you now know

      You can read and write Big-O / Theta\Theta / Omega\Omega statements rigorously, rank common growth rates, distinguish worst- from average-case, and analyse divide-and-conquer recurrences via the master theorem. The next section applies this toolkit to graph algorithms (BFS, DFS, Dijkstra); after that we look at sorting and finally tackle the P-vs-NP question that organises the entire field.

      Mark section complete →

      References

      • Garrity, T. (2002). All the Mathematics You Missed. Cambridge University Press, ch. 16.
      • Cormen, T. H., Leiserson, C. E., Rivest, R. L., Stein, C. (2009). Introduction to Algorithms (3rd ed.). MIT Press, ch. 3–4.
      • Sipser, M. (2012). Introduction to the Theory of Computation (3rd ed.). Cengage, ch. 7.
      • Sedgewick, R., Wayne, K. (2011). Algorithms (4th ed.). Addison-Wesley, ch. 1.4.
      • Knuth, D. E. (1997). The Art of Computer Programming, Vol. 1 (3rd ed.). Addison-Wesley, §1.2 (asymptotic notation).

      This page is prerendered for SEO and accessibility. The interactive widgets above hydrate on JavaScript load.