Gradient descent

From Wikipedia, the free encyclopedia

(Redirected from Steepest descent)
Jump to: navigation, search

Gradient descent is an optimization algorithm. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or the approximate gradient) of the function at the current point. If instead one takes steps proportional to the gradient, one approaches a local maximum of that function; the procedure is then known as gradient ascent.

Gradient descent is also known as steepest descent, or the method of steepest descent. When known as the latter, gradient descent should not be confused with the method of steepest descent for approximating integrals.

Contents

Gradient descent uses Euler's method for solving ODEs to approximate a gradient flow line. As the goal is to find the minimum, not the flow line, the error in finite methods is less significant.

Gradient descent is based on the observation that if the real-valued function F(\mathbf{x}) is defined and differentiable in a neighborhood of a point \mathbf{a}, then F(\mathbf{x}) decreases fastest if one goes from \mathbf{a} in the direction of the negative gradient of F at \mathbf{a}, -\nabla F(\mathbf{a}). It follows that, if

\mathbf{b}=\mathbf{a}-\gamma\nabla F(\mathbf{a})

for γ > 0 a small enough number, then F(\mathbf{a})\geq F(\mathbf{b}). With this observation in mind, one starts with a guess \mathbf{x}_0 for a local minimum of F, and considers the sequence \mathbf{x}_0, \mathbf{x}_1, \mathbf{x}_2, \dots such that

\mathbf{x}_{n+1}=\mathbf{x}_n-\gamma_n \nabla F(\mathbf{x}_n),\ n \ge 0.

We have

F(\mathbf{x}_0)\ge F(\mathbf{x}_1)\ge F(\mathbf{x}_2)\ge \dots,

so hopefully the sequence (\mathbf{x}_n) converges to the desired local minimum. Note that the value of the step size γ is allowed to change at every iteration.

This process is illustrated in the following picture.

Here F is assumed to be defined on the plane, and that its graph has a bowl shape. The blue curves are the contour lines, that is, the regions on which the value of F is constant. A red arrow originating at a point shows the direction of the negative gradient at that point. Note that the (negative) gradient at a point is orthogonal to the contour line going through that point. We see that gradient descent leads us to the bottom of the bowl, that is, to the point where the value of the function F is minimal.

Gradient descent has problems with pathological functions such as the Rosenbrock function shown here:

The gradient ascent method applied to F(x,y)=\sin\left(\frac{1}{2} x^2 - \frac{1}{4} y^2 + 3 \right) \cos(2 x+1-e^y):

The gradient descent algorithm in action. (1: contour)The gradient descent algorithm in action. (2: surface)

Note that gradient descent works in spaces of any number of dimensions, even in infinite-dimensional ones. In the latter case the search space is typically a function space, and one calculates the Gâteaux derivative of the functional to be minimized to determine the descent direction.

Two weaknesses of gradient descent are:

  1. The algorithm can take many iterations to converge towards a local minimum, if the curvature in different directions is very different.
  2. Finding the optimal γ per step can be time-consuming. Conversely, using a fixed γ can yield poor results. Methods based on Newton's method and inversion of the Hessian using conjugate gradient techniques are often a better alternative.

A more powerful algorithm is given by the BFGS method which consists in calculating on every step a matrix by which the gradient vector is multiplied to go into a "better" direction, combined with a more sophisticated line search algorithm, to find the "best" value of γ.

  • Mordecai Avriel (2003). Nonlinear Programming: Analysis and Methods. Dover Publishing. ISBN 0-486-43227-0.
  • Jan A. Snyman (2005). Practical Mathematical Optimization: An Introduction to Basic Optimization Theory and Classical and New Gradient-Based Algorithms. Springer Publishing. ISBN 0-387-24348-8
Advanced Search
Included Web Search Engines


Safe Search

close

Top Matching Results

Occasionally Search.com will highlight specialized results that are based on the context of your query. Examples of specialized results include specific links to news, images, or video.

Top Matching Results may highlight information from other Search.com pages, content from the CNET Network of sites, or third party content. The listings are based purely on relevance. Search.com does not receive payment for listings in this section but our partners that provide this data may get paid for listing these products.

Sponsored Links

This section contains paid listings which have been purchased by companies that want to have their sites appear for specific search terms and related content. These listings are administered, sorted and maintained by a third party and are not endorsed by Search.com.

Search Results

Search.com sends your search query to several search engines at one time and integrates the results into one list which has been sorted by relevance using Search.com's proprietary algorithm. You can customize the list of search engines included in your metasearch from the preferences.

The search engines that are used in your metasearch may allow companies to pay to have their Web sites included within the results. To view the Paid Inclusion policy for a specific search engine, please visit their Web site. Search.com does not accept payment or share revenue with any search engine partner for listings in this section.