Newton's method in optimization

From Wikipedia, the free encyclopedia

In mathematics, Newton's method is a well-known algorithm for finding roots of equations in one or more dimensions. It can also be used to find local maxima and local minima of functions by noticing that if a real number x * is a stationary point of a function f(x), then x * is a root of the derivative f'(x), and therefore one can solve for x * by applying Newton's method to f'(x).

Thus, provided that f(x) is a twice-differentiable function and the initial guess x0 is chosen close enough to x * , the sequence (xn) defined by

x_{n+1} = x_n - \frac{f'(x_n)}{f''(x_n)}, \ n \ge 0

will converge towards x * .

This iterative scheme can be generalized to several dimensions by replacing the derivative with the gradient, \nabla f(\mathbf{x}), and the reciprocal of the second derivative with the inverse of the Hessian matrix, H f(\mathbf{x}). One obtains the iterative scheme

\mathbf{x}_{n+1} = \mathbf{x}_n - [H f(\mathbf{x}_n)]^{-1} \nabla f(\mathbf{x}_n), \ n \ge 0.

Usually Newton's method is modified to include a small step size γ > 0 instead of γ = 1

\mathbf{x}_{n+1} = \mathbf{x}_n - \gamma[H f(\mathbf{x}_n)]^{-1} \nabla f(\mathbf{x}_n).

This is often done to ensure that the Wolfe conditions are satisfied at each step \mathbf{x}_n \to \mathbf{x}_{n+1} of the iteration.

The geometric interpretation of Newton's method is that at each iteration one approximates f(\mathbf{x}) by a quadratic function around \mathbf{x}_n, and then takes a step towards the maximum/minimum of that quadratic function. (If f(\mathbf{x}) happens to be a quadratic function, then the exact extremum is found in one step.)

Newton's method converges much faster towards a local maximum or minimum than gradient descent. In fact, every local minimum has a neighborhood N such that, if we start with \mathbf{x}_0 \in N, Newton's method with step size γ = 1 converges quadratically (if the Hessian is invertible in that neighborhood).

Finding the inverse of the Hessian is an expensive operation, so the linear equation

\mathbf{p}_{n} = \mathbf{x}_{n+1}-\mathbf{x}_{n} = [H f(\mathbf{x}_n)]^{-1} \nabla f(\mathbf{x}_n), \ n \ge 0..

is often solved approximately (but to great accuracy) using a method such as conjugate gradient. There also exist various quasi-Newton methods, where an approximation for the Hessian is used instead.

If the Hessian is close to a non-invertible matrix, the inverted Hessian can be numerically unstable and the solution may diverge. In this case, certain workarounds have been tried in the past, which have varied success with certain problems. One can, for example, modify the Hessian by adding a correction matrix Bn so as to make H_f(\mathbf{x}_n) + B_n positive definite. One approach is to diagonalize Hf and choose Bn so that H_f(\mathbf{x}_n) + B_n has the same eigenvectors as Hf, but with each negative eigenvalue replaced by ε > 0.

  • Avriel, Mordecai (2003). Nonlinear Programming: Analysis and Methods. Dover Publishing. ISBN 0-486-43227-0.
  • Nocedal, Jorge & Wright, Stephen J. (1999). Numerical Optimization. Springer-Verlag. ISBN 0-387-98793-2.

Advanced Search
Included Web Search Engines


Safe Search

close

Top Matching Results

Occasionally Search.com will highlight specialized results that are based on the context of your query. Examples of specialized results include specific links to news, images, or video.

Top Matching Results may highlight information from other Search.com pages, content from the CNET Network of sites, or third party content. The listings are based purely on relevance. Search.com does not receive payment for listings in this section but our partners that provide this data may get paid for listing these products.

Sponsored Links

This section contains paid listings which have been purchased by companies that want to have their sites appear for specific search terms and related content. These listings are administered, sorted and maintained by a third party and are not endorsed by Search.com.

Search Results

Search.com sends your search query to several search engines at one time and integrates the results into one list which has been sorted by relevance using Search.com's proprietary algorithm. You can customize the list of search engines included in your metasearch from the preferences.

The search engines that are used in your metasearch may allow companies to pay to have their Web sites included within the results. To view the Paid Inclusion policy for a specific search engine, please visit their Web site. Search.com does not accept payment or share revenue with any search engine partner for listings in this section.