P, NP, and Computational Hardness

Part 16, Chapter 16: Algorithms and Complexity

Learning objectives

  • Define the complexity classes P and NP precisely
  • Distinguish solving in polynomial time from verifying in polynomial time
  • Define NP-hardness and NP-completeness via polynomial-time reductions
  • State the Cook-Levin theorem and the role of SAT as the canonical NP-complete problem
  • Articulate the practical and philosophical consequences of P=NPP = NP vs PNPP \neq NP

P vs NP is the most important open problem in computer science and one of the Clay Mathematics Institute's seven Millennium Prize Problems, with a 1millionpurseforaresolution.Itsstatementisdeceptivelysimple:aretheproblemswhosesolutionscanbeVERIFIEDquicklythesameastheproblemsthatcanbeSOLVEDquickly?Mostcomputerscientistsbelievetheansweris"no"(so1 million purse for a resolution. Its statement is deceptively simple: are the problems whose solutions can be VERIFIED quickly the same as the problems that can be SOLVED quickly? Most computer scientists believe the answer is "no" (soP \neq NP),butinoverhalfacenturyofeffortnoproofeitherwayhasbeenproduced.Thestakesareenormous:if), but in over half a century of effort no proof either way has been produced. The stakes are enormous: ifP = NP$, modern cryptography collapses, automated theorem-proving becomes feasible, and a substantial part of computer science changes character overnight.

The class P

A decision problem is a yes/no question about an input (e.g., "Is this graph connected?", "Does this number have a factor between 1 and 1000?"). The class P contains all decision problems solvable by a deterministic algorithm whose worst-case running time is bounded by a polynomial in the input length:

mathbfP=bigcupkgeq0textTIME(nk)\mathbf{P} = \bigcup_{k \geq 0} \text{TIME}(n^{k})kgeq0textTIME(nk).

Examples: sorting, shortest paths, primality (Agrawal-Kayal-Saxena 2002), linear programming (Khachiyan 1979), matrix determinant. P is the formal class of "tractable" problems.

The class NP

The class NP ("nondeterministic polynomial time") contains decision problems for which a "yes" answer can be VERIFIED by a deterministic algorithm in polynomial time, given a short CERTIFICATE (also called a witness). Formally, LintextNPL \in \text{NP} if there exist a polynomial pp and a polynomial-time verifier VV such that

xinLiffexistswtextwithwleqp(x)textandV(x,w)=1x \in L \iff \exists w \text{ with } |w| \leq p(|x|) \text{ and } V(x, w) = 1.

Equivalently, NP is the class of problems solvable in polynomial time on a NONDETERMINISTIC Turing machine, one that may "guess" the certificate and verify it.

Examples: SAT (given a Boolean formula, is there a satisfying assignment? Certificate: the assignment.), HAMILTONIAN PATH (does this graph contain a path visiting every vertex once? Certificate: the path.), SUBSET SUM (does this set contain a subset summing to TT? Certificate: the subset.). Every problem in P is in NP, just ignore the certificate and use the polynomial-time solver as the verifier. So PsubseteqNPP \subseteq NP.

NP-hardness and NP-completeness

A problem BB is NP-hard if every problem LinNPL \in NP can be reduced to BB in polynomial time. Reduction LleqpBL \leq_{p} BpB means there is a polynomial-time function ff such that xinLifff(x)inBx \in L \iff f(x) \in B. Intuition: solving BB would let you solve every problem in NP (using ff to translate). NP-hard problems are "at least as hard as the hardest problems in NP."

A problem is NP-complete if it is BOTH in NP and NP-hard, the hardest problems WITHIN NP. The discovery of NP-complete problems is the spine of complexity theory:

Cook-Levin theorem (1971). The Boolean satisfiability problem SAT is NP-complete. The proof simulates an arbitrary polynomial-time nondeterministic Turing machine by a polynomial-size Boolean formula.

Once one NP-complete problem (SAT) is in hand, OTHER problems are shown NP-complete by reducing SAT (or a known NP-complete problem) to them. Karp's 1972 paper exhibited 21 such reductions and seeded a whole field. Today thousands of natural problems are known to be NP-complete: 3-SAT, CLIQUE, INDEPENDENT SET, VERTEX COVER, HAMILTONIAN CYCLE, TRAVELLING SALESMAN (decision version), KNAPSACK, GRAPH COLOURING, INTEGER LINEAR PROGRAMMING, …

The P vs NP question

If PneqNPP \neq NP: there exist problems whose solutions can be checked quickly but not found quickly. NP-complete problems then have no polynomial-time algorithm; we must accept exponential worst-case running times or settle for approximations and heuristics.

If P=NPP = NP: a polynomial-time algorithm for SAT (or any other NP-complete problem) would imply polynomial-time algorithms for ALL of NP, by composing reductions. This would constructively yield (or at least prove the existence of) fast solvers for thousands of currently-intractable problems.

The empirical evidence weighs heavily toward PneqNPP \neq NP: fifty years of unsuccessful attempts to design polynomial-time algorithms for NP-complete problems, the natural-proof and relativisation barriers to many proof strategies, and the central role of NP-completeness in modern algorithm design (where "this problem is NP-complete" is essentially a verdict of "no polynomial-time algorithm expected").

Picture the Venn diagram: P is a subset of NP; NP-complete problems sit on the boundary of NP. If P=NPP = NP, the two regions collapse to one. If PneqNPP \neq NP, there is a gap, and the NP-intermediate problems (e.g., GRAPH ISOMORPHISM, FACTORING under widespread belief) live in NP but neither in P nor NP-complete.

Where this shows up
  • Cryptography: RSA, Diffie-Hellman, elliptic-curve cryptography, and the entire public-key infrastructure of the internet rely on problems (FACTORING, DISCRETE LOG) that are in NP but believed not in P. If P=NPP = NP, every secret key, every signature, every TLS handshake could be broken in polynomial time. Modern HTTPS would collapse.
  • Operations research: Production scheduling, vehicle routing, facility location, and resource allocation are NP-hard. We use approximation algorithms with provable bounds (e.g., the Christofides algorithm gives a 1.5-approximation for Metric TSP), Mixed Integer Programming solvers (Gurobi, CPLEX) with sophisticated branch-and-bound, and heuristics (simulated annealing, genetic algorithms).
  • Machine learning: Training a deep network is non-convex optimisation, NP-hard in general. We rely on stochastic gradient descent finding "good enough" local minima, an empirical bet that real-world loss landscapes are friendly, not adversarial.
  • Drug discovery and protein folding: Predicting protein 3D structure from amino-acid sequence (general case) is NP-hard. AlphaFold sidesteps this with deep learning trained on millions of known structures; it does not solve the general problem, but solves the structurally-constrained sub-problem that nature exhibits.
  • Chip design (VLSI): Placing components on a chip die to minimise wire length is NP-hard. Cadence and Synopsys EDA tools use sophisticated heuristics; the O(nlogn)O(n \log n)-vs-NP gap directly limits how fast new chip generations can be designed.

Pause and think: P is trivially a subset of NP. Why is this so? (Hint: if you can SOLVE a problem in polynomial time, you can also VERIFY a proposed solution, just ignore the proposal and solve from scratch.) The deeper question is the converse: does every problem with a polynomial-time verifier also have a polynomial-time solver?

Try it

  • Show that the problem "given a graph GG and an integer kk, does GG contain an independent set of size geqk\geq k?" is in NP by describing the certificate and the verifier.
  • Show that 2-SAT (every clause has at most 2 literals) is in P. (Hint: implication-graph SCC analysis.) Why does the same approach fail for 3-SAT?
  • Suppose tomorrow someone shows that VERTEX COVER is in P. What can you conclude about P vs NP? (Hint: VERTEX COVER is NP-complete.)
  • Why is the inequality PsubseteqNPsubseteqEXPTIMEP \subseteq NP \subseteq EXPTIME known to be strict somewhere, that is, why is there guaranteed to be a separation between P and EXPTIME even if we cannot pinpoint it? (Hint: the time-hierarchy theorem.)
  • Name three problems known to be NP-complete and write down (informally) why each is in NP. (Suggestions: Hamiltonian Cycle, Subset Sum, Graph Colouring.)

A trap to watch for

Beginners often interpret "this problem is NP-complete" as "this problem is unsolvable." That is wrong. NP-complete problems are SOLVABLE; their solutions just are not expected to scale polynomially. Small instances are fine, approximations are available for many of them, and structured sub-cases often have polynomial-time algorithms. A second trap: "NP" does not mean "not polynomial." It stands for "nondeterministic polynomial." The complement is co-NP (whose elements are problems where a NO answer has a short certificate); whether NP=textcoNPNP = \text{co-NP} is itself open. And a third trap: NP-hard problems need not be in NP; for instance, the HALTING PROBLEM is NP-hard but is uncomputable (vastly worse than NP).

What you now know

You can define P, NP, NP-hard, and NP-complete precisely; recognise the canonical NP-complete problems and the role of polynomial-time reductions; articulate the practical consequences of resolving P-vs-NP either way; and avoid the common conceptual mistakes about what NP-completeness implies. You have completed the algorithm-theory mini-tour that closes Garrity's book, from asymptotic notation through graphs, sorting, and now the complexity hierarchy itself. The structural questions you have learned to ask scale up to research-level computer science and to many adjacent fields.

Mark section complete →

References

  • Garrity, T. (2002). All the Mathematics You Missed. Cambridge University Press, ch. 16.
  • Sipser, M. (2012). Introduction to the Theory of Computation (3rd ed.). Cengage, ch. 7–8.
  • Arora, S., Barak, B. (2009). Computational Complexity: A Modern Approach. Cambridge University Press, ch. 2–3.
  • Cook, S. A. (1971). "The complexity of theorem-proving procedures." STOC 1971, pp. 151–158.
  • Karp, R. M. (1972). "Reducibility among combinatorial problems." In Complexity of Computer Computations, Plenum Press, pp. 85–103.
  • Clay Mathematics Institute. "The Millennium Prize Problems, P vs NP." https://www.claymath.org/millennium/p-vs-np/

This page is prerendered for SEO and accessibility. The interactive widgets above hydrate on JavaScript load.