The Total Derivative as a Jacobian

Part 3, Chapter 3: Calculus of Several Variables

Learning objectives

  • Define the Jacobian matrix of f:RnRmf:\mathbb{R}^n\to\mathbb{R}^m
  • Compute Jacobians by collecting partial derivatives
  • Interpret the Jacobian as the best linear approximation f(a+h)f(a)+Df(a)hf(\mathbf{a}+\mathbf{h})\approx f(\mathbf{a})+Df(\mathbf{a})\mathbf{h}
  • Predict how Jacobian determinants govern volume scaling and local invertibility

The single most important object in multivariable calculus is the Jacobian matrix. In one variable, the derivative f(a)f'(a) is the slope of the tangent line, the best linear approximation to ff at aa. In many variables, the derivative is a matrix: the Jacobian Df(mathbfa)Df(\mathbf{a}). Every theorem you learn for the rest of this chapter, the chain rule, the Inverse Function Theorem, the Implicit Function Theorem, the change-of-variables formula, is a statement about Jacobians. Get comfortable with this one object and the rest of multivariable analysis falls into place.

The definition

For f:mathbbRntomathbbRmf:\mathbb{R}^n\to\mathbb{R}^m with coordinate functions f1,ldots,fmf_1,\ldots,f_mm, the Jacobian matrix at a point mathbfa\mathbf{a} is the mtimesnm\times n matrix of partial derivatives:

Df(\mathbf{a})=\begin{pmatrix}\dfrac{\partial f_1}{\partial x_1}&\cdots&\dfrac{\partial f_1}{\partial x_n}\\\vdots&\ddots&\vdots\\\dfrac{\partial f_m}{\partial x_1}&\cdots&\dfrac{\partial f_m}{\partial x_n}\end{pmatrix}

Read row-by-row: the ii-th row is the gradient of the ii-th coordinate function. Read column-by-column: the jj-th column tells you how all the output coordinates change when xjx_jj moves.

Why the matrix IS the derivative

The defining property of Df(mathbfa)Df(\mathbf{a}) is the same approximation property the one-variable derivative had: f(mathbfa+mathbfh)approxf(mathbfa)+Df(mathbfa),mathbfhf(\mathbf{a}+\mathbf{h})\approx f(\mathbf{a})+Df(\mathbf{a})\,\mathbf{h} for small mathbfh\mathbf{h}, where the error is o(mathbfh)o(\|\mathbf{h}\|). The Jacobian is the unique linear map (matrix) that makes this approximation work to first order. Everything else flows from this: the chain rule becomes matrix multiplication, D(gcircf)=DgcdotDfD(g\circ f)=Dg\cdot Df; the IFT and IMT become statements about when this matrix is invertible.

Three special cases

  • Scalar field (m=1m=1): The Jacobian is a row vector, the gradient nablaf=(fx1,ldots,fxn)\nabla f=(f_{x_1},\ldots,f_{x_n})xn).
  • Parametric curve (n=1n=1): The Jacobian is a column vector, the tangent vector mathbfr(t)\mathbf{r}'(t).
  • Square Jacobian (m=nm=n): The determinant detDf\det Df measures local volume scaling and tells you whether the map is locally invertible.

The matrix-multiplier above lets you set a 2times22\times 2 matrix and watch it deform the unit square. That is exactly what the Jacobian does at every point: it gives you the local linear deformation of an infinitesimal patch. The determinant you see in the widget is the local area-scaling factor, positive means orientation-preserving, negative means reflected, zero means the map collapses dimensions (and is locally non-invertible).

Where this shows up
  • Machine learning, backpropagation: Training a neural network is repeated chain-rule application across stacked layers. Each layer fi(mathbfx)=sigma(Wimathbfx+mathbfbi)f_i(\mathbf{x})=\sigma(W_i\mathbf{x}+\mathbf{b}_i)i(mathbfx)=sigma(Wimathbfx+mathbfbi) has a Jacobian DfiDf_ii; the full network's Jacobian is the product DfLcdotsDf2cdotDf1Df_L\cdots Df_2\cdot Df_1LcdotsDf_2cdotDf_1. Backprop computes this product in reverse order, which is more efficient when the output dimension is small.
  • Robotics, inverse kinematics: The Jacobian J(boldsymboltheta)J(\boldsymbol{\theta}) of the forward-kinematics map relates joint-velocity to end-effector velocity: dotmathbfp=J(boldsymboltheta)dotboldsymboltheta\dot{\mathbf{p}}=J(\boldsymbol{\theta})\dot{\boldsymbol{\theta}}. Solving for joint motion to achieve a desired end-effector trajectory requires inverting JJ, the inverse-kinematics problem in one equation.
  • Continuous optimization, Newton's method: The multivariable Newton update is \mathbf{x}_{k+1}=\mathbf{x}_k-[Df(\mathbf{x}_k)]^{-1}f(\mathbf{x}_k). The Jacobian inverse is the search direction. When DfDf is well-conditioned, convergence is quadratic; when it is near-singular, you get the famous "Newton breaks" behaviour.
  • Computer graphics, texture mapping: The Jacobian determinant of a UV-mapping tells you the local area-scaling between texture space and surface space. Graphics engines use it to adjust mipmap selection and avoid aliasing on stretched textures.
  • Pause and think: Polar coordinates send (r,theta)(r,\theta) to (rcostheta,rsintheta)(r\cos\theta,r\sin\theta). The Jacobian determinant is rr. What does it mean geometrically that the determinant is rr rather than 11? (Hint: an infinitesimal rectangle dr,dthetadr\,d\theta in polar space corresponds to an infinitesimal patch of area r,dr,dthetar\,dr\,d\theta in the plane, bigger patches farther from the origin.)

    Try it

    • Predict first: what is the Jacobian of f(x,y)=(x2y2,2xy)f(x,y)=(x^2-y^2,\ 2xy)? (This is the squaring map on complex numbers; its Jacobian determinant is 4(x2+y2)4(x^2+y^2).) Find where the map fails to be locally invertible.
    • Compute the Jacobian of the spherical-coordinate map (r,phi,theta)mapsto(rsinphicostheta,rsinphisintheta,rcosphi)(r,\phi,\theta)\mapsto(r\sin\phi\cos\theta,\ r\sin\phi\sin\theta,\ r\cos\phi). Confirm that its determinant is r2sinphir^2\sin\phi, the dVdV factor in spherical-coordinate integrals.
    • Predict: if f:mathbbR3tomathbbR2f:\mathbb{R}^3\to\mathbb{R}^2 and g:mathbbR2tomathbbR4g:\mathbb{R}^2\to\mathbb{R}^4, what shape is the chain-rule Jacobian D(gcircf)D(g\circ f)? Dimensions: 4times34\times 3. (It must match dimtextoutputofgcircf\dim\text{output of }g\circ f rows and dimtextinputoff\dim\text{input of }f columns.)
    • Use the matrix-multiplier widget above with the matrix \begin{pmatrix}1&1\\0&1\end{pmatrix}. Notice: determinant is 1 (area preserved) but the unit square gets sheared into a parallelogram. The Jacobian captures both effects.
    • Trap: the Jacobian determinant being zero at a point does NOT always mean the map fails to be invertible, it just means the IFT cannot guarantee invertibility. Try f(x)=x3f(x)=x^3: f(0)=0f'(0)=0 but ff is globally invertible.

    A trap to watch for

    The existence of all partial derivatives at a point does NOT imply that ff is differentiable there. A function can have partialf/partialx\partial f/\partial x and partialf/partialy\partial f/\partial y at the origin yet fail to have a tangent plane. The sufficient condition is that the partials are continuous in a neighbourhood (i.e. ff is C1C^1), in that case the Jacobian exists and gives the correct linear approximation. Always check continuity of partials, not just existence, before invoking the Jacobian as "the derivative."

    What you now know

    You can compute Jacobians for any vector-valued function and read off geometric information from their entries and determinants. The next two sections use this matrix to state the two cornerstone theorems of multivariable analysis: the Inverse Function Theorem (when can we invert ff?) and the Implicit Function Theorem (when does F(x,y)=0F(x,y)=0 define yy as a function of xx?).

    Mark section complete →

    References

    • Garrity, T. (2002). All the Mathematics You Missed. Cambridge UP, ch. 3.
    • Spivak, M. (1965). Calculus on Manifolds. W. A. Benjamin, ch. 2.
    • Munkres, J. R. (1991). Analysis on Manifolds. Westview Press, ch. 2.
    • Rudin, W. (1976). Principles of Mathematical Analysis (3rd ed.). McGraw-Hill, ch. 9.
    • Apostol, T. M. (1974). Mathematical Analysis (2nd ed.). Addison-Wesley, ch. 12.

    This page is prerendered for SEO and accessibility. The interactive widgets above hydrate on JavaScript load.