The Total Derivative as a Jacobian

Part 3, Chapter 3: Calculus of Several Variables

Learning objectives

Define the Jacobian matrix of $f:\mathbb{R}^n\to\mathbb{R}^m$
Compute Jacobians by collecting partial derivatives
Interpret the Jacobian as the best linear approximation $f(\mathbf{a}+\mathbf{h})\approx f(\mathbf{a})+Df(\mathbf{a})\mathbf{h}$
Predict how Jacobian determinants govern volume scaling and local invertibility

The single most important object in multivariable calculus is the Jacobian matrix. In one variable, the derivative $f'(a)$ is the slope of the tangent line, the best linear approximation to $f$ at $a$ . In many variables, the derivative is a matrix: the Jacobian $Df(\mathbf{a})$ . Every theorem you learn for the rest of this chapter, the chain rule, the Inverse Function Theorem, the Implicit Function Theorem, the change-of-variables formula, is a statement about Jacobians. Get comfortable with this one object and the rest of multivariable analysis falls into place.

The definition

For $f:\mathbb{R}^n\to\mathbb{R}^m$ with coordinate functions $f_1,\ldots,f_m$ , the Jacobian matrix at a point $\mathbf{a}$ is the $m\times n$ matrix of partial derivatives:

$Df(\mathbf{a})=\begin{pmatrix}\dfrac{\partial f_1}{\partial x_1}&\cdots&\dfrac{\partial f_1}{\partial x_n}\\\vdots&\ddots&\vdots\\\dfrac{\partial f_m}{\partial x_1}&\cdots&\dfrac{\partial f_m}{\partial x_n}\end{pmatrix}$

Read row-by-row: the $i$ -th row is the gradient of the $i$ -th coordinate function. Read column-by-column: the $j$ -th column tells you how all the output coordinates change when $x_j$ j moves.

Why the matrix IS the derivative

The defining property of $Df(\mathbf{a})$ is the same approximation property the one-variable derivative had: $f(\mathbf{a}+\mathbf{h})\approx f(\mathbf{a})+Df(\mathbf{a})\,\mathbf{h}$ for small $\mathbf{h}$ , where the error is $o(\|\mathbf{h}\|)$ . The Jacobian is the unique linear map (matrix) that makes this approximation work to first order. Everything else flows from this: the chain rule becomes matrix multiplication, $D(g\circ f)=Dg\cdot Df$ ; the IFT and IMT become statements about when this matrix is invertible.

Three special cases

Scalar field ( $m=1$ ): The Jacobian is a row vector, the gradient $\nabla f=(f_{x_1},\ldots,f_{x_n})$ n).
Parametric curve ( $n=1$ ): The Jacobian is a column vector, the tangent vector $\mathbf{r}'(t)$ .
Square Jacobian ( $m=n$ ): The determinant $\det Df$ measures local volume scaling and tells you whether the map is locally invertible.

The matrix-multiplier above lets you set a $2\times 2$ matrix and watch it deform the unit square. That is exactly what the Jacobian does at every point: it gives you the local linear deformation of an infinitesimal patch. The determinant you see in the widget is the local area-scaling factor, positive means orientation-preserving, negative means reflected, zero means the map collapses dimensions (and is locally non-invertible).

Where this shows up

Machine learning, backpropagation: Training a neural network is repeated chain-rule application across stacked layers. Each layer $f_i(\mathbf{x})=\sigma(W_i\mathbf{x}+\mathbf{b}_i)$ i(mathbfx)=sigma(Wimathbfx+mathbfbi) has a Jacobian $Df_i$ i; the full network's Jacobian is the product $Df_L\cdots Df_2\cdot Df_1$ LcdotsDf_2cdotDf_1. Backprop computes this product in reverse order, which is more efficient when the output dimension is small.
Robotics, inverse kinematics: The Jacobian $J(\boldsymbol{\theta})$ of the forward-kinematics map relates joint-velocity to end-effector velocity: $\dot{\mathbf{p}}=J(\boldsymbol{\theta})\dot{\boldsymbol{\theta}}$ . Solving for joint motion to achieve a desired end-effector trajectory requires inverting $J$ , the inverse-kinematics problem in one equation.
Continuous optimization, Newton's method: The multivariable Newton update is $\mathbf{x}_{k+1}=\mathbf{x}_k-[Df(\mathbf{x}_k)]^{-1}f(\mathbf{x}_k)$ . The Jacobian inverse is the search direction. When $Df$ is well-conditioned, convergence is quadratic; when it is near-singular, you get the famous "Newton breaks" behaviour.
Computer graphics, texture mapping: The Jacobian determinant of a UV-mapping tells you the local area-scaling between texture space and surface space. Graphics engines use it to adjust mipmap selection and avoid aliasing on stretched textures.

Pause and think: Polar coordinates send $(r,\theta)$ to $(r\cos\theta,r\sin\theta)$ . The Jacobian determinant is $r$ . What does it mean geometrically that the determinant is $r$ rather than $1$ ? (Hint: an infinitesimal rectangle $dr\,d\theta$ in polar space corresponds to an infinitesimal patch of area $r\,dr\,d\theta$ in the plane, bigger patches farther from the origin.)

Try it

Predict first: what is the Jacobian of $f(x,y)=(x^2-y^2,\ 2xy)$ ? (This is the squaring map on complex numbers; its Jacobian determinant is $4(x^2+y^2)$ .) Find where the map fails to be locally invertible.
Compute the Jacobian of the spherical-coordinate map $(r,\phi,\theta)\mapsto(r\sin\phi\cos\theta,\ r\sin\phi\sin\theta,\ r\cos\phi)$ . Confirm that its determinant is $r^2\sin\phi$ , the $dV$ factor in spherical-coordinate integrals.
Predict: if $f:\mathbb{R}^3\to\mathbb{R}^2$ and $g:\mathbb{R}^2\to\mathbb{R}^4$ , what shape is the chain-rule Jacobian $D(g\circ f)$ ? Dimensions: $4\times 3$ . (It must match $\dim\text{output of }g\circ f$ rows and $\dim\text{input of }f$ columns.)
Use the matrix-multiplier widget above with the matrix $\begin{pmatrix}1&1\\0&1\end{pmatrix}$ . Notice: determinant is 1 (area preserved) but the unit square gets sheared into a parallelogram. The Jacobian captures both effects.
Trap: the Jacobian determinant being zero at a point does NOT always mean the map fails to be invertible, it just means the IFT cannot guarantee invertibility. Try $f(x)=x^3$ : $f'(0)=0$ but $f$ is globally invertible.

A trap to watch for

The existence of all partial derivatives at a point does NOT imply that $f$ is differentiable there. A function can have $\partial f/\partial x$ and $\partial f/\partial y$ at the origin yet fail to have a tangent plane. The sufficient condition is that the partials are continuous in a neighbourhood (i.e. $f$ is $C^1$ ), in that case the Jacobian exists and gives the correct linear approximation. Always check continuity of partials, not just existence, before invoking the Jacobian as "the derivative."

What you now know

You can compute Jacobians for any vector-valued function and read off geometric information from their entries and determinants. The next two sections use this matrix to state the two cornerstone theorems of multivariable analysis: the Inverse Function Theorem (when can we invert $f$ ?) and the Implicit Function Theorem (when does $F(x,y)=0$ define $y$ as a function of $x$ ?).

Mark section complete →

References

Garrity, T. (2002). All the Mathematics You Missed. Cambridge UP, ch. 3.
Spivak, M. (1965). Calculus on Manifolds. W. A. Benjamin, ch. 2.
Munkres, J. R. (1991). Analysis on Manifolds. Westview Press, ch. 2.
Rudin, W. (1976). Principles of Mathematical Analysis (3rd ed.). McGraw-Hill, ch. 9.
Apostol, T. M. (1974). Mathematical Analysis (2nd ed.). Addison-Wesley, ch. 12.