Deriving the Hessian from the limit definition of the derivative

$\begingroup$

Could someone possibly help me understand how I can derive the Hessian matrix of a twice-differentiable function $f$ defined on $\mathbb{R}^n$ using the limit definition of the second derivative. Namely, how does: $\lim_{h -> 0}\frac{\nabla f(x+h) - \nabla f(x)}{h}$ result in the Hessian $\nabla^2 f(x)$. If I happen to be wrong about this, could you please point out what I am misunderstanding?

Thank you very much!

$\endgroup$ 10

4 Answers

$\begingroup$

To extend user251257's answer, we have that, for any vector $v\in\mathbb{R}^n$, $$\lim_{h\rightarrow 0} \frac{\nabla f (x + hv) - \nabla f(x)}{h} = \nabla^2 f(x)v$$ We can deduce this directly from his/her answer and subsequent comments since, as he/she suggested, $$\lim_{h\rightarrow 0} \frac{\nabla f (x + he_i) - \nabla f(x)}{h} = \nabla^2 f(x)e_i = \begin{bmatrix} \frac{\partial^2 f(x)}{\partial x_i \partial x_j}\end{bmatrix}_j \in \mathbb{R}^n$$

For more on this topic, I recommend reading about directional derivatives in a multivariate analysis text such as that of Loomis & Sternberg.

$\endgroup$ 3 $\begingroup$

In the end of the day, $\nabla f$ is a function on several variables that produces a vector (or dual vector, depending on your point of view). What we need, then, is a definition of the derivative that applies to vector-valued (or matrix-valued) functions.

One definition that works is as follows: suppose we have the function $$ F(x) = \pmatrix{F_1(x_1,\dots,x_n) & \cdots & F_m(x_1,\dots,x_n)} $$ Then we can define $$ \nabla F(x) = \pmatrix{ -\nabla F_1-\\ -\nabla F_2-\\ \vdots\\ -\nabla F_m-\\ } $$ so that each row is the gradient of a function. Now, if $F(x) = \nabla f$, then we end up with the Hessian $\nabla^2f$.

On the other hand, another way to extend the definition is to say that the derivative of a function $F(x_1,\dots,x_n)$ at a point $(z_1,\dots,z_n)$ is the unique linear function $[F'(z_1,\dots,z_n)](x_1,\dots,x_n)$ which we can write as $A(x_1,\dots,x_n)$ satisfying $$ \lim_{h \to 0} \frac{F(x + h) - F(x) - A(h)}{\|h\|} = 0 $$ This is (in a sense) the most general definition of a derivative, and it is indeed equivalent to the definition given above.

$\endgroup$ 2 $\begingroup$

The quotient $\frac{\nabla f(x + h) - \nabla f(x)}{h}$ isn't properly defined if $n > 1$.

However, the limit $$ \lim_{h\to 0 }\frac{\nabla f(x + he_i) - \nabla f(x)}{h} $$ gives the $i$ th column (or row depending on your preference how to write $\nabla f$) of $\nabla^2 f(x)$, for $1\le i \le n$.

$\endgroup$ 4 $\begingroup$

Let $f: \mathbb{R}^n \to \mathbb{R}$ be a differentiable function.

Then, at a point $p$, the derivative $Df\big|_p: \mathbb{R}^n \to \mathbb{R}$ can be computed by (but is not defined by)

$$ Df\big|_p(v) = \lim_{h \to 0} \frac{f(p+hv)-f(p)}{h} $$

If $f$ is differentiable then $Df\big|_p$ is a linear function from $\mathbb{R}^n \to \mathbb{R}$. We have that $f(p+v) \approx f(p)+Df\big|_p(v)$

If $f$ is twice differentiable, then we can think of its second derivative as a bilinear form $Hf\big|_p:\mathbb{R}^n \times \mathbb{R}^n \to \mathbb{R}$. It can be computed by (but not defined by)

$$ Hf\big|_p(v,w) = \lim_{h \to 0} \frac{Df_{p+hv}(w) - Df\big|_p(w)}{h} $$

We have that $Df\big|_{p+v}(w) \approx Df\big|_p(w)+Hf\big|_p(v,w)$.

It also turns out (the beginning of the multivariable Taylor's theorem), that $f(p+v) \approx f(p)+Df\big|_p(v)+\frac{1}{2!}Hf\big|_p(v,v)$.

The pattern continues with higher derivatives being higher order symmetric tensors.

$\endgroup$

Fame Burst

Deriving the Hessian from the limit definition of the derivative

4 Answers

Your Answer

Sign up or log in

Post as a guest

You Might Also Like

My PlayStation account doesn't link to Ubisoft account

Is there a difference between map chat and team chat in WvW in Guild Wars 2?

Does stealing via "Mug" also count towards the Talent for Acquisition trophy?

What does "zoning" mean for fighting games?