This page is a sub-page of our page on Calculus of Several Real Variables.

///////

Related KMR-pages:

In Swedish:

///////

Other relevant sources of information:

///////

The interactive simulations on this page can be navigated with the Free Viewer
of the Graphing Calculator.

///////

///////

Anchors into the text below:

///////

A function $f$ from $\, \mathbb{R}^2 \,$ to $\, \mathbb{R} \,$ can be described by:

${{\mathbb{R}^2 \, \stackrel {f} {\longrightarrow} \, \mathbb{R} \:}\atop {\: (x,y) \, \longmapsto \, f(x,y) } } {\,}.$

The differential $df$ of the function $f$ at the point $(a,b) \in \mathbb{R}^2$ is given by:

$df_{(a,b)} = \frac{\partial f}{\partial x}_{(a,b)} dx + \frac{\partial f}{\partial y}_{(a,b)} dy.$

The equation of the level curve ($\, f = c_{onstant} \,$) of the function $f$ at the point $(a,b)$
is given by:

$f(x,y) = f(a,b)$.

The equation of the tangent to the level curve of the function $f$ at the point $(a,b)$
is given by:

$\frac{\partial f}{\partial x}_{(a,b)} (x-a) + \frac{\partial f}{\partial y}_{(a,b)} (y-b) = 0.$

The gradient of the function $f$ is the function ${\nabla f}$ defined by:

${{\mathbb{R}^2 \, \stackrel {\nabla f} {\longrightarrow} \, \mathbb{R}^2 \:}\atop {\: (x,y) \, \longmapsto \, (\frac{\partial f}{\partial x} , \frac{\partial f}{\partial y}) } } {\,}.$

Hence the value of ${\nabla f}$ at the point $(x,y)$ is:

${\nabla f}_{(x,y)} = (\frac{\partial f}{\partial x} , \frac{\partial f}{\partial y}).$

The value of ${\nabla f}_{(x,y)}$ at the point $x=a,y=b$ is obtained
by evaluating the function ${\nabla f}_{(x,y)}$ at the point $(a,b)$:

${\nabla f}_{(a,b)} = (\frac{\partial f}{\partial x} , \frac{\partial f}{\partial y})_{(a,b)}.$

Hence we see that the gradient ${\nabla f}_{(a,b)}$ of the function $f$ at the point $(a,b)$
is perpendicular to the tangent of the level curve of the function $f$ at the point $(a,b).$

Gradients of flying carpets and level surfaces

Gradients of flying carpets and level surfaces

The “flying carpet” style equation for the graph of the function $f$ can be expressed as:

$z = f(x, y).$

The zero-level surface for the graph of the function $z = f(x, y)$ can be expressed as:

$\mathrm g(x,y,z) \, \stackrel {\mathrm{def}}{=} f(x,y)-z = 0 \, .$

Gradient of a flying-carpet type of surface

In the animation below, the “input” function $f(x, y)$ is given by:

$f(x, y) = \frac{1}{4} (x^2 + 4 y^2)$.

IMPORTANT: The 3D-gradients of the level surfaces $\, g(x, y, z) = f(x, y) - z = c_{onstant} \,$ project (along the $\, z$-direction) onto the 2D-gradients of the level curves $\, f(x, y) = c_{onstant} \,$:

///////

Gradient of a level surface – example 1:

$\, g(x, y, z) = \frac{1}{3} (x^2 + 4 y^2 + 9 z^2) \,$

///////

Gradient of a level surface – example 2:

$\, g(x, y, z) = \dfrac{x^2}{A} + \dfrac{y^2}{B} + \dfrac{z^2}{C} \; , \; C < B < A \,$

/////// Translating Folke Eriksson, Flerdimensionell analys, p.98

The gradient as a covariant vector:

So far we have been working within a single, fixed cartesian coordinate system in $\mathbb{R}^n$, e.g., the $xy$-system in $\mathbb{R}^2$. In this system we have defined $\nabla f = (f'_x, f'_y)$. However, the gradient can also be expressed in other coordinate systems that are not necessarily cartesian. For example, if $(r, \theta)$ are polar coordinates, the partial derivatives of the function $(r, \theta) \rightarrow f(r \cos \theta, r \sin \theta) = z$, that is $\frac {\partial z} {\partial r}$ and $\frac {\partial z} {\partial \theta}$, can be thought of as coordinates of $\nabla f$ in the $r \theta$-system. In this case the vector $\nabla f$ itself is one and the same, but it has different coordinates in different systems.

Without making use of any specific coordinate system it would be possible (according to 3.6 and 3.5) to define the gradient geometrically by making use of the normals of the level curves (respectively the level surfaces), or by using the direction of quickest ascent/descent of the value of the underlying function (as well as a measure of the function's rate of growth along this direction).

Hence, the gradient gives a more general expression for the changes of a function $\, f \,$ in the neighborhood of a point $\, \mathbf {a} \,$ than that of its partial derivatives, which are only related to a specific coordinate system. In contrast, the gradient contains (albeit implicitly) information about the values of the partial derivatives of a function in all possible coordinate systems.

NOTE: There is an important difference between a gradient vector $\, \nabla f \,$ and a directed segment $\, AB$, i.e., a vector $\, \mathbf {v}$. While $|\mathbf {v}|$ (i.e., the length of the segment $\, AB \,$) has dimension length, the modulus of the gradient, $\, |\nabla f|$, has dimension $\, \text {length}^{-1} \,$ (if the values of the function $\, f \,$ are dimension-less quantities). Therefore, in this case, the modulus of the gradient gives the rate of change of the value of the function $\, f$, i.e., the change of the function's value per unit of length (as we have seen in section 3.5).

This is related to the fact that under a coordinate transformation the gradient $\, \nabla f \,$ behaves differently from how directed segments behave. For example, if one changes from an orthonormal coordinate system basis vectors $\, { \mathbf {e}}_i \,$ to a coordinate system with basis vectors $\, 2 {\mathbf{e}}_i$, we have for a directed segment:

$\mathbf {v} = \sum\limits_i v_i {\mathbf {e}}_i = \sum\limits_i \frac {1}{2} v_i (2 {\mathbf{e}}_i) = \sum\limits_i v'_i \, (2 {\mathbf{e}}_i)$,

which means that we have $\, v'_i = \frac{1}{2} v_i$. Moreover, since $\, x_i = 2 x'_i$, it follows from the chain rule that the coordinates of the gradient in the new system are given by:

$\dfrac{\partial f}{\partial x'_i} = \sum\limits_k \dfrac{\partial f}{\partial x_k} \dfrac {\partial x_k}{\partial x'_i} = \sum\limits_i \dfrac {\partial f}{\partial x_i} \dfrac {\partial x_i}{\partial x'_i} = 2 \dfrac {\partial f}{\partial x_i}$.

Therefore, it is unnecessarily limiting to write e.g., $\, \nabla f = \sum \frac {\partial f}{\partial x_i} {\mathbf{e}}_i$.

It is much more general to introduce (as in 3.5) the scalar product $\, \nabla f \cdot \mathbf{v}$, since the formula $\, \nabla f \cdot \mathbf{v} = \sum \frac {\partial f}{\partial x_i} \, v_i \,$ for such a scalar product is valid in every coordinate system.

In contrast, the formula $\, \mathbf{v} \cdot \mathbf{u} = \sum v_i u_i \,$ for two directed segments is only valid in orthonormal coordinate systems. In our example above of a coordinate transformation we have $\, \frac {\partial f}{\partial x_i} \, v_i = \sum \frac {\partial f}{\partial x'_i} \, v'_i$, which is unchanged (= invariant) under the transformation, but $\, \sum {v_i}^2 = 4 \sum {v'_i}^2$, which is not invariant since we are changing into a non-orthonormal coordinate system.

For reasons that will appear later, it is appropriate to write $\, \nabla f \,$ as a matrix $\, G \,$ with one row, and a directed segment $\, \mathbf {v} \,$, i.e., a vector, as a matrix $\, V \,$ with one column. Then the scalar product $\, \nabla f \cdot \mathbf{v} \,$ can be identified with the matrix product $\, GV$.

In applications to physics (and other areas) there are many entities that behave as vectors of one kind or the other during coordinate transformations; entities that also have other physical dimensions. Some, as for example velocity, have dimension length/time and are transformed in the same way as directed segments. Such entities are called contravariant vectors. Other entities, such as for example force (which is often the gradient of a function) are transformed in the same way as gradients. Such entities are called covariant vectors.

As long as one only transforms between orthonormal coordinate systems one can compute in the same way with both contravariant and covariant vectors, but when one involves a non-orthonormal coordinate system in the transformation, the different kinds of vectors must be treated differently.

In a non-orthonormal coordinate system it is not in general easy to define coordinates for directed segments in a suitable manner, because one needs to compute with different basis vectors at different points of the segment.

For example, in polar coordinates (with basis vectors $\, {\mathbf{e}}_r \,$ respectively $\, {\mathbf{e}}_{\theta} \,$) one needs to compute with the directions that the lines $\, \theta = {\theta}_0 \,$ respectively the circles $\, r = r_0 \,$ have at different points. In such cases the basis vectors may in general be different at different points of a segment $\, AB$.

In the general case it is better to make use of the chain rule, which gives the transformation formulas for the coordinates of a gradient (cf (4), page 84).

By definition, a covariant vector $\, \mathbf{v} \,$ is an $\, n$-tuple of numbers $\, u_k (S), k = 1, 2, \dots, n \,$, which varies with the coordinate system $\, S \,$ according to the mentioned transformation formulas.

Analogously, one defines a contravariant vector $\, \mathbf{v} \,$ as an $\, n$-tuple of numbers $\, v_k (S), k = 1, 2, \dots, n \,$ that depends on the coordinate system $\, S \,$ in a different way. This dependency is characterized by the fact that the "scalar product" $\, \sum u_k (S) \, v_k (S) \,$ of a covariant vector $\, \mathbf{u} \,$ and a contravariant vector $\, \mathbf{v} \,$ is independent of the coordinate system $\, S \,$ and therefore invariant under a change of $\, S$.

There are also more general, so-called geometric objects (e.g., pseudovectors and tensors) that vary with the coordinate system in other ways.

/////// End of translation from Eriksson, Flerdimensionell analys.