This page is a sub-page of our page on Calculus of Several Real Variables.

///////

Related KMR-pages:

In Swedish:

///////

Other relevant sources of information:

///////

The interactive simulations on this page can be navigated with the Free Viewer
of the Graphing Calculator.

///////

///////

Anchors into the text below:

///////

A function $f$ from $\, \mathbb{R}^2 \,$ to $\, \mathbb{R} \,$ can be described by:

${{\mathbb{R}^2 \, \stackrel {f} {\longrightarrow} \, \mathbb{R} \:}\atop {\: (x,y) \, \longmapsto \, f(x,y) } } {\,}.$

The differential $df$ of the function $f$ at the point $(a,b) \in \mathbb{R}^2$ is given by:

$df_{(a,b)} = \frac{\partial f}{\partial x}_{(a,b)} dx + \frac{\partial f}{\partial y}_{(a,b)} dy.$

The equation of the level curve ($\, f = c_{onstant} \,$) of the function $f$ at the point $(a,b)$
is given by:

$f(x,y) = f(a,b)$.

The equation of the tangent to the level curve of the function $f$ at the point $(a,b)$
is given by:

$\frac{\partial f}{\partial x}_{(a,b)} (x-a) + \frac{\partial f}{\partial y}_{(a,b)} (y-b) = 0.$

The gradient of the function $f$ is the function ${\nabla f}$ defined by:

${{\mathbb{R}^2 \, \stackrel {\nabla f} {\longrightarrow} \, \mathbb{R}^2 \:}\atop {\: (x,y) \, \longmapsto \, (\frac{\partial f}{\partial x} , \frac{\partial f}{\partial y}) } } {\,}.$

Hence the value of ${\nabla f}$ at the point $(x,y)$ is:

${\nabla f}_{(x,y)} = (\frac{\partial f}{\partial x} , \frac{\partial f}{\partial y}).$

The value of ${\nabla f}_{(x,y)}$ at the point $x=a,y=b$ is obtained
by evaluating the function ${\nabla f}_{(x,y)}$ at the point $(a,b)$:

${\nabla f}_{(a,b)} = (\frac{\partial f}{\partial x} , \frac{\partial f}{\partial y})_{(a,b)}.$

Hence we see that the gradient ${\nabla f}_{(a,b)}$ of the function $f$ at the point $(a,b)$
is perpendicular to the tangent of the level curve of the function $f$ at the point $(a,b).$

Gradients of flying carpets and level surfaces

Gradients of flying carpets and level surfaces

The “flying carpet” style equation for the graph of the function $f$ can be expressed as:

$z = f(x, y).$

The zero-level surface for the graph of the function $z = f(x, y)$ can be expressed as:

$\mathrm g(x,y,z) \, \stackrel {\mathrm{def}}{=} f(x,y)-z = 0 \, .$

Gradient of a flying-carpet type of surface

In the animation below, the “input” function $f(x, y)$ is given by:

$f(x, y) = \frac{1}{4} (x^2 + 4 y^2)$.

IMPORTANT: The 3D-gradients of the level surfaces $\, g(x, y, z) = f(x, y) - z = c_{onstant} \,$ project (along the $\, z$-direction) onto the 2D-gradients of the level curves $\, f(x, y) = c_{onstant} \,$:

///////

Gradient of a level surface – example 1:

$\, g(x, y, z) = \frac{1}{3} (x^2 + 4 y^2 + 9 z^2) \,$

///////

Gradient of a level surface – example 2:

$\, g(x, y, z) = \dfrac{x^2}{A} + \dfrac{y^2}{B} + \dfrac{z^2}{C} \; , \; C < B < A \,$

/////// Translating Folke Eriksson, Flerdimensionell analys, p.98

The gradient as a covariant vector:

So far we have been working within a single, fixed cartesian coordinate system in $\mathbb{R}^n$, e.g., the $xy$-system in $\mathbb{R}^2$. In this system we have defined $\nabla f = (f'_x, f'_y)$. However, the gradient can also be expressed in other coordinate systems that are not necessarily cartesian. For example, if $(r, \theta)$ are polar coordinates, the partial derivatives of the function $(r, \theta) \rightarrow f(r \cos \theta, r \sin \theta) = z$, that is $\frac {\partial z} {\partial r}$ and $\frac {\partial z} {\partial \theta}$, can be thought of as coordinates of $\nabla f$ in the $r \theta$-system. In this case the vector $\nabla f$ itself is one and the same, but it has different coordinates in different systems.

Without making use of any specific coordinate system it would be possible (according to 3.6 and 3.5) to define the gradient geometrically by making use of the normals of the level curves (respectively the level surfaces), or by using the direction of quickest ascent/descent of the value of the underlying function (as well as a measure of the function's rate of growth along this direction).

Hence, the gradient gives a more general expression for the changes of a function $\, f \,$ in the neighborhood of a point $\, \mathbf {a} \,$ than that of its partial derivatives, which are only related to a specific coordinate system. In contrast, the gradient contains (albeit implicitly) information about the values of the partial derivatives of a function in all possible coordinate systems.

NOTE: There is an important difference between a gradient vector $\, \nabla f \,$ and a directed segment $\, AB$, i.e., a vector $\, \mathbf {v}$. While $|\mathbf {v}|$ (i.e., the length of the segment $\, AB \,$) has dimension length, the modulus of the gradient, $\, |\nabla f|$, has dimension $\, \text {length}^{-1} \,$ (if the values of the function $\, f \,$ are dimension-less quantities). Therefore, in this case, the modulus of the gradient gives the rate of change of the value of the function $\, f$, i.e., the change of the function's value per unit of length (as we have seen in section 3.5).

This is related to the fact that under a coordinate transformation the gradient $\, \nabla f \,$ behaves differently from how directed segments behave. For example, if one changes from an orthonormal coordinate system basis vectors $\, { \mathbf {e}}_i \,$ to a coordinate system with basis vectors $\, 2 {\mathbf{e}}_i$, we have for a directed segment:

$\mathbf {v} = \sum\limits_i v_i {\mathbf {e}}_i = \sum\limits_i \frac {1}{2} v_i (2 {\mathbf{e}}_i) = \sum\limits_i v'_i \, (2 {\mathbf{e}}_i)$,

which means that we have $\, v'_i = \frac{1}{2} v_i$. Moreover, since $\, x_i = 2 x'_i$, it follows from the chain rule that the coordinates of the gradient in the new system are given by:

$\dfrac{\partial f}{\partial x'_i} = \sum\limits_k \dfrac{\partial f}{\partial x_k} \dfrac {\partial x_k}{\partial x'_i} = \sum\limits_i \dfrac {\partial f}{\partial x_i} \dfrac {\partial x_i}{\partial x'_i} = 2 \dfrac {\partial f}{\partial x_i}$.

Therefore, it is unnecessarily limiting to write e.g., $\, \nabla f = \sum \frac {\partial f}{\partial x_i} {\mathbf{e}}_i$.

It is much more general to introduce (as in 3.5) the scalar product $\, \nabla f \cdot \mathbf{v}$, since the formula $\, \nabla f \cdot \mathbf{v} = \sum \frac {\partial f}{\partial x_i} \, v_i \,$ for such a scalar product is valid in every coordinate system.

In contrast, the formula $\, \mathbf{v} \cdot \mathbf{u} = \sum v_i u_i \,$ for two directed segments is only valid in orthonormal coordinate systems. In our example above of a coordinate transformation we have $\, \frac {\partial f}{\partial x_i} \, v_i = \sum \frac {\partial f}{\partial x'_i} \, v'_i$, which is unchanged (= invariant) under the transformation, but $\, \sum {v_i}^2 = 4 \sum {v'_i}^2$, which is not invariant since we are changing into a non-orthonormal coordinate system.

For reasons that will appear later, it is appropriate to write $\, \nabla f \,$ as a matrix $\, G \,$ with one row, and a directed segment $\, \mathbf {v} \,$, i.e., a vector, as a matrix $\, V \,$ with one column. Then the scalar product $\, \nabla f \cdot \mathbf{v} \,$ can be identified with the matrix product $\, GV$.

In applications to physics (and other areas) there are many entities that behave as vectors of one kind or the other during coordinate transformations; entities that also have other physical dimensions. Some, as for example velocity, have dimension length/time and are transformed in the same way as directed segments. Such entities are called contravariant vectors. Other entities, such as for example force (which is often the gradient of a function) are transformed in the same way as gradients. Such entities are called covariant vectors.

As long as one only transforms between orthonormal coordinate systems one can compute in the same way with both contravariant and covariant vectors, but when one involves a non-orthonormal coordinate system in the transformation, the different kinds of vectors must be treated differently.

In a non-orthonormal coordinate system it is not in general easy to define coordinates for directed segments in a suitable manner, because one needs to compute with different basis vectors at different points of the segment.

For example, in polar coordinates (with basis vectors $\, {\mathbf{e}}_r \,$ respectively $\, {\mathbf{e}}_{\theta} \,$) one needs to compute with the directions that the lines $\, \theta = {\theta}_0 \,$ respectively the circles $\, r = r_0 \,$ have at different points. In such cases the basis vectors may in general be different at different points of a segment $\, AB$.

In the general case it is better to make use of the chain rule, which gives the transformation formulas for the coordinates of a gradient (cf (4), page 84).

By definition, a covariant vector $\, \mathbf{v} \,$ is an $\, n$-tuple of numbers $\, u_k (S), k = 1, 2, \dots, n \,$, which varies with the coordinate system $\, S \,$ according to the mentioned transformation formulas.

Analogously, one defines a contravariant vector $\, \mathbf{v} \,$ as an $\, n$-tuple of numbers $\, v_k (S), k = 1, 2, \dots, n \,$ that depends on the coordinate system $\, S \,$ in a different way. This dependency is characterized by the fact that the "scalar product" $\, \sum u_k (S) \, v_k (S) \,$ of a covariant vector $\, \mathbf{u} \,$ and a contravariant vector $\, \mathbf{v} \,$ is independent of the coordinate system $\, S \,$ and therefore invariant under a change of $\, S$.

There are also more general, so-called geometric objects (e.g., pseudovectors and tensors) that vary with the coordinate system in other ways.

/////// End of translation from Eriksson, Flerdimensionell analys.

IN SWEDISH:

////////////////////////////////////////////////

/////// Quoting Folke Eriksson, Flerdimensionell analys, p.98

Vi har hittills alltid tänkt oss ett enda fixt koordinatsystem i Rn \mathbb{R}^n Rn, t.ex xy xyxy-systemet i R2 \mathbb{R}^2 R2. I detta system har vi definierat ∇f=(fx′,fy′) \nabla f = (f'_x, f'_y) ∇f=(fx′​,fy′​). Gradienten kan emellertid uttryckas även i andra koordinatsystem, allmännare än rätvinkliga. Om t.ex (r,θ) (r, \theta) (r,θ) är polära koordinater, kan de partiella derivatorna av funktionen (r,θ)→f(rcos⁡θ,rsin⁡θ)=z (r, \theta) \rightarrow f(r \cos \theta, r \sin \theta) = z (r,θ)→f(rcosθ,rsinθ)=z, alltså ∂z∂r \frac {\partial z} {\partial r} ∂r∂z​ och ∂z∂θ \frac {\partial z} {\partial \theta} ∂θ∂z​, uppfattas som koordinater för ∇f \nabla f ∇f i rθ r \thetarθ-systemet. Själva vektorn ∇f \nabla f ∇f är då en och densamma, men den har olika koordinater i olika system.

Enligt 3.6 och 3.5 skulle man också, utan att direkt använda något koordinatsystem, kunna definiera gradienten geometriskt, med hjälp av nivåkurvornas (respektive nivåytornas) normaler, eller den riktning längs vilken funktionsvärdet växer snabbast (jämte ett mått på den snabbaste tillväxten). Gradienten ger således ett mer allmängiltigt uttryck för funktionsvärdets förändringar i närheten av en punkt a \mathbf {a} a än de partiella derivatorna, vilka ju bara hänför sig till ett speciellt koordinatsystem. Gradienten däremot är en sammanfattning av de partiella derivatornas värden i alla lämpliga koordinatsystem.

Vi kan lägga märke till en viktig skillnad mellan en gradientvektor ∇f \nabla f ∇f och en riktad sträcka v \mathbf {v} v. Medan ∣v∣ | \mathbf {v} | ∣v∣ (avståndet AB AB AB) naturligtvis har dimensionen längd, har ∣∇f∣ | \nabla f | ∣∇f∣ (om värdena av funktionen f f f är dimensionslösa tal) dimensionen (la¨ngd)−1 (\text {längd})^{-1} (la¨ngd)−1. Gradientens belopp anger alltså, som vi sett i avsnitt 3.5, värdeändring per längdenhet.

Med detta sammanhänger att ∇f \nabla f∇f vid koordinattransformation uppför sig helt annorlunda än riktade sträckor. Om man t.ex. från ett ortonormerat system med basvektorerna ei { \mathbf {e}}_i ei​ övergår till ett system med basvektorerna 2ei 2 {\mathbf{e}}_i 2ei​, gäller där för en riktad sträcka

v=∑viei=∑12vi(2ei)=∑vi′ (2ei) \mathbf {v} = \sum v_i {\mathbf {e}}_i = \sum \frac {1}{2} v_i (2 {\mathbf{e}}_i) = \sum v'_i \, (2 {\mathbf{e}}_i) v=∑vi​ei​=∑21​vi​(2ei​)=∑vi′​(2ei​),

där alltså vi′=12vi v'_i = \frac{1}{2} v_i vi′​=21​vi​. Gradientens koordinater i det nya koordinatsystemet är däremot enligt kedjeregeln:

∂f∂xi′=∑∂f∂xk∂xk∂xi′=∂f∂xi∂xi∂xi′=2∂f∂xi \dfrac{\partial f}{\partial x'_i} = \sum \dfrac{\partial f}{\partial x_k} \dfrac {\partial x_k}{\partial x'_i} = \dfrac {\partial f}{\partial x_i} \dfrac {\partial x_i}{\partial x'_i} = 2 \dfrac {\partial f}{\partial x_i} ∂xi′​∂f​=∑∂xk​∂f​∂xi′​∂xk​​=∂xi​∂f​∂xi′​∂xi​​=2∂xi​∂f​

ty xi=2xi′ x_i = 2 x'_i xi​=2xi′​.

Det är därför mindre lämpligt att skriva t.ex ∇f=∑∂f∂xiei \nabla f = \sum \frac {\partial f}{\partial x_i} {\mathbf{e}}_i ∇f=∑∂xi​∂f​ei​.

Däremot går det utmärkt att som i 3.5 införa skalärprodukten v⋅∇f \mathbf{v} \cdot \nabla f v⋅∇f. Det är t.om. så att formeln v⋅∇f=∑vi∂f∂xi \mathbf{v} \cdot \nabla f = \sum v_i \frac {\partial f}{\partial x_i} v⋅∇f=∑vi​∂xi​∂f​ för en sådan skalärprodukt gäller i varje koordinatsystem medan formeln v⋅u=∑viui \mathbf{v} \cdot \mathbf{u} = \sum v_i u_i v⋅u=∑vi​ui​ för två riktade sträckor bara gäller i ortonormerade system.

I vårt exempel nyss är vi∂f∂xi=∑vi′∂f∂xi′ v_i \frac {\partial f}{\partial x_i} = \sum v'_i \frac {\partial f}{\partial x'_i} vi​∂xi​∂f​=∑vi′​∂xi′​∂f​, men ∑vi2=4∑vi′2 \sum {v_i}^2 = 4 \sum {v'_i}^2 ∑vi​2=4∑vi′​2.

Det är lämpligt att skriva ∇f \nabla f ∇f som en matris G G G med en rad, och en riktad sträcka v \mathbf {v} v som en matris V V V med en kolonn. Då blir skalärprodukten ∇f⋅v \nabla f \cdot \mathbf{v} ∇f⋅v lika med matrisprodukten GV GV GV.

I fysikaliska (med flera) tillämpningar förekommer många storheter, som vid koordinattransformationer i rummet uppför sig som vektorer av det ena eller andra slaget, men som därjämte har annan fysikalisk dimension. Somliga, som t.ex hastighet (med dimensionen längd/tid), transformeras på samma sätt som riktade sträckor och kallas kontravarianta vektorer. Andra, t.ex kraft (kraften är ofta lika med gradienten av en funktion V(x) V(\mathbf{x}) V(x)), transformeras som gradienter och kallas kovarianta vektorer. Så länge som man bara använder ortonormerade koordinatsystem, kan man räkna på samma sätt med båda slagen av vektorer. Men i allmännare koordinatsystem är skillnaden väsentlig.

I allmänna koordinatsystem är det inte så lätt att direkt definiera koordinater för riktade sträckor på ett lämpligt sätt. Man behöver nämligen räkna med olika basvektorer i olika punkter. För t.ex polära koordinater med basvektorerna er {\mathbf{e}}_r er​ och eθ {\mathbf{e}}_{\theta} eθ​ behöver man räkna med de riktningar som linjerna θ=θ0 \theta = {\theta}_0 θ=θ0​ respektive cirklarna r=r0 r = r_0 r=r0​ har i olika punkter. Basvektorerna kan då vara olika i olika punkter av en sträcka AB AB AB.

Man kan i stället i det allmänna fallet utgå från kedjeregeln, vilken ger transformationsformlerna för koordinaterna av en gradient (jfr (4), sid 84). En kovariant vektor är definitionsmässigt en n nn-tipel av tal uk(S),k=1,2,…,n) u_k (S), k = 1, 2, \dots, n) uk​(S),k=1,2,…,n), som varierar med koordinatsystemet S S S enligt de nämnda transformationsformlerna.

Analogt definierar man en kontravariant vektor v \mathbf{v} v som en n nn-tipel av tal vk(S) v_k (S) vk​(S), som beror på koordinatsystemet S S S på ett annat specifikt sätt. Detta kan karaktäriseras just av att "skalärprodukten" ∑uk(S) vk(S) \sum u_k (S) \, v_k (S) ∑uk​(S)vk​(S) av en kovariant vektor u \mathbf{u} u och en kontravariant vektor v \mathbf{v} v är oberoende av koordinatsystemet S S S.

Det finns även allmännare s.k. geometriska objekt (t.ex. pseudovektorer och tensorer), vilka varierar med koordinatsystemet på andra sätt.

/////// End of quote from Eriksson, Flerdimensionell analys.