Why the negative gradient gives the direction of the steepest decrease in the gradient descent algorithm?

$\begingroup$

I understand that the gradient vector gives the direction of the maximum growth. What I don't get is why going the exact opposite direction is going to get the maximum decrease?

By sure that holds for a single variable function because it only have 2 ways to go. But in a multivariable domain I can imagine that just going the opposite may not be the maximum decrease as I could go many other paths and maybe some of them are better.

Is is the case that the only happens in the case the function is differentiable?

$\endgroup$ 2

3 Answers

$\begingroup$

This is really related to the very definition of differentiable. I can imagine, why you have doubts about this, as thinking of real surfaces that occur in nature gives the impression that the directions of steepest increase and decrease are not opposite of each other.

But now recall that a function is called differentiable, if it can approximated locally by a plane. In other words: a differentiable function looks like a plane locally, and for a plane it is pretty clear why the directions of steepest descent and increase are opposite of each other.

Another thing: Just "existence of partial derivatives" (i.e. the gradient can be computed) does not imply that the negative gradient is the steepest descent direction.

$\endgroup$ 2 $\begingroup$

Let me give you an intuitive, non-mathematical answer for the two variable case.

Suppose you are standing on a surface. When facing uphill your ankles will be bent with your toes pointing uphill and you will be facing in the compass direction of the gradient. If you turn around $180^\circ$ to face in the opposite direction, your toes will be pointing downhill by the negative of the angle that they were pointing uphill.

$\endgroup$ $\begingroup$

By Taylor to the first order,

$$f(\mathbf x+\mathbf{\delta x})\approx f(\mathbf x)+\nabla f\cdot\mathbf{\delta x}=f(\mathbf x)+\|\nabla f\|\|\mathbf{\delta x}\|\cos\phi.$$

The two vectors $\nabla f$ and $\mathbf{\delta x}$ have a minimum dot product when they are antiparallel ($\cos\phi=-1$).

$\endgroup$

Fame Burst

Why the negative gradient gives the direction of the steepest decrease in the gradient descent algorithm?

3 Answers

Your Answer

Sign up or log in

Post as a guest

You Might Also Like

Is there a difference between map chat and team chat in WvW in Guild Wars 2?

Does stealing via "Mug" also count towards the Talent for Acquisition trophy?

What does "zoning" mean for fighting games?

Is there no other way to remove the "7 day purchase" market ban other than spending money?