P, NP, and Computational Hardness
Learning objectives
- Define the complexity classes P and NP precisely
- Distinguish solving in polynomial time from verifying in polynomial time
- Define NP-hardness and NP-completeness via polynomial-time reductions
- State the Cook-Levin theorem and the role of SAT as the canonical NP-complete problem
- Articulate the practical and philosophical consequences of vs
P vs NP is the most important open problem in computer science and one of the Clay Mathematics Institute's seven Millennium Prize Problems, with a Its statement is deceptively simple: are the problems whose solutions can be VERIFIED quickly the same as the problems that can be SOLVED quickly? Most computer scientists believe the answer is "no" (soP \neq NPP = NP$, modern cryptography collapses, automated theorem-proving becomes feasible, and a substantial part of computer science changes character overnight.
The class P
A decision problem is a yes/no question about an input (e.g., "Is this graph connected?", "Does this number have a factor between 1 and 1000?"). The class P contains all decision problems solvable by a deterministic algorithm whose worst-case running time is bounded by a polynomial in the input length:
.
Examples: sorting, shortest paths, primality (Agrawal-Kayal-Saxena 2002), linear programming (Khachiyan 1979), matrix determinant. P is the formal class of "tractable" problems.
The class NP
The class NP ("nondeterministic polynomial time") contains decision problems for which a "yes" answer can be VERIFIED by a deterministic algorithm in polynomial time, given a short CERTIFICATE (also called a witness). Formally, if there exist a polynomial and a polynomial-time verifier such that
.
Equivalently, NP is the class of problems solvable in polynomial time on a NONDETERMINISTIC Turing machine, one that may "guess" the certificate and verify it.
Examples: SAT (given a Boolean formula, is there a satisfying assignment? Certificate: the assignment.), HAMILTONIAN PATH (does this graph contain a path visiting every vertex once? Certificate: the path.), SUBSET SUM (does this set contain a subset summing to ? Certificate: the subset.). Every problem in P is in NP, just ignore the certificate and use the polynomial-time solver as the verifier. So .
NP-hardness and NP-completeness
A problem is NP-hard if every problem can be reduced to in polynomial time. Reduction means there is a polynomial-time function such that . Intuition: solving would let you solve every problem in NP (using to translate). NP-hard problems are "at least as hard as the hardest problems in NP."
A problem is NP-complete if it is BOTH in NP and NP-hard, the hardest problems WITHIN NP. The discovery of NP-complete problems is the spine of complexity theory:
Cook-Levin theorem (1971). The Boolean satisfiability problem SAT is NP-complete. The proof simulates an arbitrary polynomial-time nondeterministic Turing machine by a polynomial-size Boolean formula.
Once one NP-complete problem (SAT) is in hand, OTHER problems are shown NP-complete by reducing SAT (or a known NP-complete problem) to them. Karp's 1972 paper exhibited 21 such reductions and seeded a whole field. Today thousands of natural problems are known to be NP-complete: 3-SAT, CLIQUE, INDEPENDENT SET, VERTEX COVER, HAMILTONIAN CYCLE, TRAVELLING SALESMAN (decision version), KNAPSACK, GRAPH COLOURING, INTEGER LINEAR PROGRAMMING, …
The P vs NP question
If : there exist problems whose solutions can be checked quickly but not found quickly. NP-complete problems then have no polynomial-time algorithm; we must accept exponential worst-case running times or settle for approximations and heuristics.
If : a polynomial-time algorithm for SAT (or any other NP-complete problem) would imply polynomial-time algorithms for ALL of NP, by composing reductions. This would constructively yield (or at least prove the existence of) fast solvers for thousands of currently-intractable problems.
The empirical evidence weighs heavily toward : fifty years of unsuccessful attempts to design polynomial-time algorithms for NP-complete problems, the natural-proof and relativisation barriers to many proof strategies, and the central role of NP-completeness in modern algorithm design (where "this problem is NP-complete" is essentially a verdict of "no polynomial-time algorithm expected").
Picture the Venn diagram: P is a subset of NP; NP-complete problems sit on the boundary of NP. If , the two regions collapse to one. If , there is a gap, and the NP-intermediate problems (e.g., GRAPH ISOMORPHISM, FACTORING under widespread belief) live in NP but neither in P nor NP-complete.
- Cryptography: RSA, Diffie-Hellman, elliptic-curve cryptography, and the entire public-key infrastructure of the internet rely on problems (FACTORING, DISCRETE LOG) that are in NP but believed not in P. If , every secret key, every signature, every TLS handshake could be broken in polynomial time. Modern HTTPS would collapse.
- Operations research: Production scheduling, vehicle routing, facility location, and resource allocation are NP-hard. We use approximation algorithms with provable bounds (e.g., the Christofides algorithm gives a 1.5-approximation for Metric TSP), Mixed Integer Programming solvers (Gurobi, CPLEX) with sophisticated branch-and-bound, and heuristics (simulated annealing, genetic algorithms).
- Machine learning: Training a deep network is non-convex optimisation, NP-hard in general. We rely on stochastic gradient descent finding "good enough" local minima, an empirical bet that real-world loss landscapes are friendly, not adversarial.
- Drug discovery and protein folding: Predicting protein 3D structure from amino-acid sequence (general case) is NP-hard. AlphaFold sidesteps this with deep learning trained on millions of known structures; it does not solve the general problem, but solves the structurally-constrained sub-problem that nature exhibits.
- Chip design (VLSI): Placing components on a chip die to minimise wire length is NP-hard. Cadence and Synopsys EDA tools use sophisticated heuristics; the -vs-NP gap directly limits how fast new chip generations can be designed.
Pause and think: P is trivially a subset of NP. Why is this so? (Hint: if you can SOLVE a problem in polynomial time, you can also VERIFY a proposed solution, just ignore the proposal and solve from scratch.) The deeper question is the converse: does every problem with a polynomial-time verifier also have a polynomial-time solver?
Try it
- Show that the problem "given a graph and an integer , does contain an independent set of size ?" is in NP by describing the certificate and the verifier.
- Show that 2-SAT (every clause has at most 2 literals) is in P. (Hint: implication-graph SCC analysis.) Why does the same approach fail for 3-SAT?
- Suppose tomorrow someone shows that VERTEX COVER is in P. What can you conclude about P vs NP? (Hint: VERTEX COVER is NP-complete.)
- Why is the inequality known to be strict somewhere, that is, why is there guaranteed to be a separation between P and EXPTIME even if we cannot pinpoint it? (Hint: the time-hierarchy theorem.)
- Name three problems known to be NP-complete and write down (informally) why each is in NP. (Suggestions: Hamiltonian Cycle, Subset Sum, Graph Colouring.)
A trap to watch for
Beginners often interpret "this problem is NP-complete" as "this problem is unsolvable." That is wrong. NP-complete problems are SOLVABLE; their solutions just are not expected to scale polynomially. Small instances are fine, approximations are available for many of them, and structured sub-cases often have polynomial-time algorithms. A second trap: "NP" does not mean "not polynomial." It stands for "nondeterministic polynomial." The complement is co-NP (whose elements are problems where a NO answer has a short certificate); whether is itself open. And a third trap: NP-hard problems need not be in NP; for instance, the HALTING PROBLEM is NP-hard but is uncomputable (vastly worse than NP).
What you now know
You can define P, NP, NP-hard, and NP-complete precisely; recognise the canonical NP-complete problems and the role of polynomial-time reductions; articulate the practical consequences of resolving P-vs-NP either way; and avoid the common conceptual mistakes about what NP-completeness implies. You have completed the algorithm-theory mini-tour that closes Garrity's book, from asymptotic notation through graphs, sorting, and now the complexity hierarchy itself. The structural questions you have learned to ask scale up to research-level computer science and to many adjacent fields.
Mark section complete →
References
- Garrity, T. (2002). All the Mathematics You Missed. Cambridge University Press, ch. 16.
- Sipser, M. (2012). Introduction to the Theory of Computation (3rd ed.). Cengage, ch. 7–8.
- Arora, S., Barak, B. (2009). Computational Complexity: A Modern Approach. Cambridge University Press, ch. 2–3.
- Cook, S. A. (1971). "The complexity of theorem-proving procedures." STOC 1971, pp. 151–158.
- Karp, R. M. (1972). "Reducibility among combinatorial problems." In Complexity of Computer Computations, Plenum Press, pp. 85–103.
- Clay Mathematics Institute. "The Millennium Prize Problems, P vs NP." https://www.claymath.org/millennium/p-vs-np/