Could someone possibly help me understand how I can derive the Hessian matrix of a twice-differentiable function $f$ defined on $\mathbb{R}^n$ using the limit definition of the second derivative. Namely, how does: $\lim_{h -> 0}\frac{\nabla f(x+h) - \nabla f(x)}{h}$ result in the Hessian $\nabla^2 f(x)$. If I happen to be wrong about this, could you please point out what I am misunderstanding?
Thank you very much!
$\endgroup$ 104 Answers
$\begingroup$To extend user251257's answer, we have that, for any vector $v\in\mathbb{R}^n$, $$\lim_{h\rightarrow 0} \frac{\nabla f (x + hv) - \nabla f(x)}{h} = \nabla^2 f(x)v$$ We can deduce this directly from his/her answer and subsequent comments since, as he/she suggested, $$\lim_{h\rightarrow 0} \frac{\nabla f (x + he_i) - \nabla f(x)}{h} = \nabla^2 f(x)e_i = \begin{bmatrix} \frac{\partial^2 f(x)}{\partial x_i \partial x_j}\end{bmatrix}_j \in \mathbb{R}^n$$
For more on this topic, I recommend reading about directional derivatives in a multivariate analysis text such as that of Loomis & Sternberg.
$\endgroup$ 3 $\begingroup$In the end of the day, $\nabla f$ is a function on several variables that produces a vector (or dual vector, depending on your point of view). What we need, then, is a definition of the derivative that applies to vector-valued (or matrix-valued) functions.
One definition that works is as follows: suppose we have the function $$ F(x) = \pmatrix{F_1(x_1,\dots,x_n) & \cdots & F_m(x_1,\dots,x_n)} $$ Then we can define $$ \nabla F(x) = \pmatrix{ -\nabla F_1-\\ -\nabla F_2-\\ \vdots\\ -\nabla F_m-\\ } $$ so that each row is the gradient of a function. Now, if $F(x) = \nabla f$, then we end up with the Hessian $\nabla^2f$.
On the other hand, another way to extend the definition is to say that the derivative of a function $F(x_1,\dots,x_n)$ at a point $(z_1,\dots,z_n)$ is the unique linear function $[F'(z_1,\dots,z_n)](x_1,\dots,x_n)$ which we can write as $A(x_1,\dots,x_n)$ satisfying $$ \lim_{h \to 0} \frac{F(x + h) - F(x) - A(h)}{\|h\|} = 0 $$ This is (in a sense) the most general definition of a derivative, and it is indeed equivalent to the definition given above.
$\endgroup$ 2 $\begingroup$The quotient $\frac{\nabla f(x + h) - \nabla f(x)}{h}$ isn't properly defined if $n > 1$.
However, the limit $$ \lim_{h\to 0 }\frac{\nabla f(x + he_i) - \nabla f(x)}{h} $$ gives the $i$ th column (or row depending on your preference how to write $\nabla f$) of $\nabla^2 f(x)$, for $1\le i \le n$.
$\endgroup$ 4 $\begingroup$Let $f: \mathbb{R}^n \to \mathbb{R}$ be a differentiable function.
Then, at a point $p$, the derivative $Df\big|_p: \mathbb{R}^n \to \mathbb{R}$ can be computed by (but is not defined by)
$$ Df\big|_p(v) = \lim_{h \to 0} \frac{f(p+hv)-f(p)}{h} $$
If $f$ is differentiable then $Df\big|_p$ is a linear function from $\mathbb{R}^n \to \mathbb{R}$. We have that $f(p+v) \approx f(p)+Df\big|_p(v)$
If $f$ is twice differentiable, then we can think of its second derivative as a bilinear form $Hf\big|_p:\mathbb{R}^n \times \mathbb{R}^n \to \mathbb{R}$. It can be computed by (but not defined by)
$$ Hf\big|_p(v,w) = \lim_{h \to 0} \frac{Df_{p+hv}(w) - Df\big|_p(w)}{h} $$
We have that $Df\big|_{p+v}(w) \approx Df\big|_p(w)+Hf\big|_p(v,w)$.
It also turns out (the beginning of the multivariable Taylor's theorem), that $f(p+v) \approx f(p)+Df\big|_p(v)+\frac{1}{2!}Hf\big|_p(v,v)$.
The pattern continues with higher derivatives being higher order symmetric tensors.
$\endgroup$