Lagrange multipliers

From Wikipedia, the free encyclopedia

(Redirected from Lagrange multiplier)
Jump to: navigation, search
Fig. 1. Drawn in green is the locus (contour) of points satisfying the constraint g(x,y) = c. Drawn in blue are contours of f.  Arrows represent the gradient, which points in a direction normal to the contour.
Fig. 1. Drawn in green is the locus (contour) of points satisfying the constraint g(x,y) = c. Drawn in blue are contours of f. Arrows represent the gradient, which points in a direction normal to the contour.

In mathematical optimization problems, the method of Lagrange multipliers, named after Joseph Louis Lagrange, is a method for finding the extrema of a function of several variables subject to one or more constraints; it is the basic tool in nonlinear constrained optimization.

Lagrange multipliers compute the stationary points of the constrained function. By Fermat's theorem, extrema occur either at these points, or on the boundary, or at points where the function is not differentiable.

It reduces finding stationary points of a constrained function in n variables with k constraints to finding stationary points of an unconstrained function in n+k variables. The method introduces a new unknown scalar variable (called the Lagrange multiplier) for each constraint, and defines a new function (called the Lagrangian) in terms of the original function, the constraints, and the Lagrange multipliers.

Contents

Consider a two-dimensional case. Suppose we have a function f(x,y), to maximize, subject to the constraint

g\left( x, y \right) = c,

where c is a constant. We can visualize contours of f given by

f \left( x, y \right)=d_n

for various values of dn, and the contour of g given by g(x,y) = c.

Suppose we walk along the contour line with g = c. In general the contour lines of f and g may be distinct, so traversing the contour line for g = c could intersect with or cross the contour lines of f. This is equivalent to saying that whilst moving along the contour line for g = c the value of f can vary. Only when the contour line for g = c touches contour lines of f tangentially, we do not increase or decrease the value of f - that is, when the contour lines touch but do not cross.

This occurs exactly when the tangential component of the total derivative vanishes: df_\parallel = 0, which is at the constrained stationary points of f (which include the constrained local extrema, assuming f is differentiable). Computationally, this is when the gradient of f is normal to the constraint(s): when \nabla f = \lambda \nabla g for some scalar λ.

A familiar example can be obtained from weather maps, with their contour lines for temperature and pressure: the constrained extrema will occur where the superposed maps show touching lines (isopleths).

Geometrically we translate the tangency condition to saying that the gradients of f and g are parallel vectors at the maximum, since the gradients are always normal to the contour lines. Thus we want points (x,y) where \nabla_{x,y} f = \lambda \nabla_{x,y} g, and, further, g(x,y) = c. To incorporate both these conditions into one equation, we introduce an unknown scalar, λ, and solve

 \nabla_{x,y,\lambda} F \left( x , y, \lambda \right)=0

with

 F \left( x , y, \lambda \right) = f \left(x, y \right) + \lambda \left(g \left(x, y \right) - c \right),

and

 \nabla_{x,y,\lambda} = \left( \frac{\partial}{\partial x}, \frac{\partial}{\partial y}, \frac{\partial}{\partial \lambda}  \right).

As discussed above, we are looking for stationary points of f seen while traveling on the level set g(x,y) = c. This occurs just when the gradient of f has no component tangential to the level sets of g. This condition is equivalent to  \nabla_{x,y} f(x,y) = \lambda \nabla_{x,y} g(x,y) for some λ. Stationary points (x,y,λ) of F also satisfy g(x,y) = c as can be seen by considering the derivative with respect to λ.

Be aware that the solutions are the stationary points of the Lagrangian F, and are saddle points: they are not necessarily extrema of F. F is unbounded: given a point (x,y) that doesn't lie on the constraint, letting \lambda \to \pm \infty makes F arbitrarily large or small. However, under certain stronger assumptions, as we shall see below, the strong Lagrangian principle holds, which states that the maxima of f maximize the Lagrangian globally.

Denote the objective function by f(\mathbf x) and let the constraints be given by g_k(\mathbf x)=0, perhaps by moving constants to the left, as in h_k(\mathbf x)-c_k=g_k(\mathbf x). The domain of f should be an open set containing all points satisfying the constraints. Furthermore, f and the gk must have continuous first partial derivatives and the gradients of the gk must not be zero on the domain.[1] Now, define the Lagrangian, Λ, as

\Lambda(\mathbf x, \boldsymbol \lambda) = f + \sum_k \lambda_k g_k.
k is an index for variables and functions associated with a particular constraint, k.
\mathbf \lambda without a subscript indicates the vector with elements \mathbf \lambda_k, which are taken to be independent variables.

Observe that both the optimization criteria and constraints gk(x) are compactly encoded as stationary points of the Lagrangian:

\nabla_{\mathbf x} \Lambda = \mathbf{0} if and only if \nabla_{\mathbf x} f = - \sum_k \lambda_k \nabla_{\mathbf x} g_k,
\nabla_{\mathbf x} means to take the gradient only with respect to each element in the vector \mathbf x, instead of all variables.

and

\nabla_{\mathbf \lambda} \Lambda = \mathbf{0} implies gk = 0.

Collectively, the stationary points of the Lagrangian,

\nabla \Lambda = \mathbf{0},

give a number of unique equations totaling the length of \mathbf x plus the length of \mathbf \lambda. This often makes it possible to solve for every x and λk, without inverting the gk.[1] For this reason, the Lagrange multiplier method can be useful in situations where it is easier to find derivatives of the constraint functions than to invert them.

Often the Lagrange multipliers have an interpretation as some salient quantity of interest. To see why this might be the case, observe that:

\frac{\partial \Lambda}{\partial {g_k}} = \lambda_k.

So, λk is the rate of change of the quantity being optimized as a function of the constraint variable. As examples, in Lagrangian mechanics the equations of motion are derived by finding stationary points of the action, the time integral of the difference between kinetic and potential energy. Thus, the force on a particle due to a scalar potential, F = −∇V, can be interpreted as a Lagrange multiplier determining the change in action (transfer of potential to kinetic energy) following a variation in the particle's constrained trajectory. In economics, the optimal profit to a player is calculated subject to a constrained space of actions, where a Lagrange multiplier is the value of relaxing a given constraint (e.g. through bribery or other means).

The method of Lagrange multipliers is generalized by the Karush-Kuhn-Tucker conditions.

Fig. 2. Illustration of the constrained optimization problem.
Fig. 2. Illustration of the constrained optimization problem.

Suppose you wish to maximize f(x,y) = x + y subject to the constraint x2 + y2 = 1. The constraint is the unit circle, and the level sets of f are diagonal lines (with slope -1), so one can see graphically that the maximum occurs at (\sqrt{2}/2,\sqrt{2}/2) (and the minimum occurs at (-\sqrt{2}/2,-\sqrt{2}/2))

Formally, set g(x,y) = x2 + y2 − 1, and

Λ(x,y,λ) = f(x,y) + λg(x,y) = x + y + λ(x2 + y2 − 1)

Set the derivative dΛ = 0, which yields the system of equations:

\begin{align}
\frac{\partial \Lambda}{\partial x}       &= 1 + 2 \lambda x &&= 0, \qquad \text{(i)} \\
\frac{\partial \Lambda}{\partial y}       &= 1 + 2 \lambda y &&= 0, \qquad \text{(ii)} \\
\frac{\partial \Lambda}{\partial \lambda} &= x^2 + y^2 - 1   &&= 0, \qquad \text{(iii)} 
\end{align}

As always, the \partial \lambda equation is the original constraint.

Combining the first two equations yields x = y (explicitly, x \neq 0 (otherwise (i) yields 1 = 0), so one can solve for λ, yielding λ = − 1 / (2x), which one can substitute into (ii)).

Substituting into (iii) yields 2x2 = 1, so x=\pm \sqrt{2}/2 and the stationary points are (\sqrt{2}/2,\sqrt{2}/2) and (-\sqrt{2}/2,-\sqrt{2}/2). Evaluating the objective function f on these yields

f(\sqrt{2}/2,\sqrt{2}/2)=\sqrt{2}\mbox{ and } f(-\sqrt{2}/2, -\sqrt{2}/2)=-\sqrt{2},

thus the maximum is \sqrt{2}, which is attained at (\sqrt{2}/2,\sqrt{2}/2) and the minimum is -\sqrt{2}, which is attained at (-\sqrt{2}/2,-\sqrt{2}/2).

Fig. 3. Illustration of the constrained optimization problem.
Fig. 3. Illustration of the constrained optimization problem.

Suppose you want to find the maximum values for

 f(x, y) = x^2 y \,

with the condition that the x and y coordinates lie on the circle around the origin with radius √3, that is,

 x^2 + y^2 = 3. \,

As there is just a single condition, we will use only one multiplier, say λ.

Use the constraint to define a function g(x, y):

g (x, y) = x^2 +y^2 -3. \,

The function g is identically zero on the circle of radius √3. So any multiple of g(xy) may be added to f(xy) leaving f(xy) unchanged in the region of interest (above the circle where our original constraint is satisfied). Let

\Lambda(x, y, \lambda) = f(x,y) + \lambda g(x, y) = x^2y +  \lambda (x^2 + y^2 - 3). \,

The critical values of Λ occur when its gradient is zero. The partial derivatives are

\begin{align}
\frac{\partial \Lambda}{\partial x}       &= 2 x y + 2 \lambda x &&= 0, \qquad \text{(i)} \\
\frac{\partial \Lambda}{\partial y}       &= x^2 + 2 \lambda y   &&= 0, \qquad \text{(ii)} \\
\frac{\partial \Lambda}{\partial \lambda} &= x^2 + y^2 - 3       &&= 0. \qquad \text{(iii)}
\end{align}

Equation (iii) is just the original constraint. Equation (i) implies x = 0 or λ = −y. In the first case, if x = 0 then we must have y = \pm \sqrt{3} by (iii) and then by (ii) λ=0. In the second case, if λ = −y and substituting into equation (ii) we have that,

x^2 - 2y^2 = 0. \,

Then x2 = 2y2. Substituting into equation (iii) and solving for y gives this value of y:

y = \pm 1. \,

Clearly there are six critical points:

 (\sqrt{2},1); \quad (-\sqrt{2},1); \quad (\sqrt{2},-1); \quad (-\sqrt{2},-1); \quad (0,\sqrt{3}); \quad (0,-\sqrt{3}).

Evaluating the objective at these points, we find

 f(\pm\sqrt{2},1) = 2; \quad f(\pm\sqrt{2},-1) = -2; \quad f(0,\pm \sqrt{3})=0.

Therefore, the objective function attains a maximum at

 (\sqrt{2},1) \quad\text{and}\quad (-\sqrt{2},1),

and a minimum at the other two critical points. The points (0,\pm\sqrt{3}) are saddle points.

Suppose we wish to find the discrete probability distribution with maximal information entropy. Then

f(p_1,p_2,\ldots,p_n) = -\sum_{k=1}^n p_k\log_2 p_k.

Of course, the sum of these probabilities equals 1, so our constraint is g(p) = 1 with

g(p_1,p_2,\ldots,p_n)=\sum_{k=1}^n p_k.

We can use Lagrange multipliers to find the point of maximum entropy (depending on the probabilities). For all k from 1 to n, we require that

\frac{\partial}{\partial p_k}(f+\lambda (g-1))=0,

which gives

\frac{\partial}{\partial p_k}\left(-\sum_{k=1}^n p_k \log_2 p_k + \lambda (\sum_{k=1}^n p_k - 1) \right) = 0.

Carrying out the differentiation of these n equations, we get

-\left(\frac{1}{\ln 2}+\log_2 p_k \right)  + \lambda = 0.

This shows that all pi are equal (because they depend on λ only). By using the constraint ∑k pk = 1, we find

p_k = \frac{1}{n}.

Hence, the uniform distribution is the distribution with the greatest entropy.

Constrained optimization plays a central role in economics. For example, the choice problem for a consumer is represented as one of maximizing a utility function subject to a budget constraint. The Lagrange multiplier has an economic interpretation as the shadow price associated with the constraint, in this case the marginal utility of income.

Given a convex optimization problem in standard form

minimize f0(x) subject to

f_i(x) \leq 0,\ i \in \left \{1,\dots,m \right \}
h_i(x) = 0,\ i \in \left \{1,\dots,p \right \}

with the domain \mathcal{D} \subset \mathbb{R}^n having non-empty interior, the Lagrangian function L: \mathbb{R}^n \times \mathbb{R}^m \times \mathbb{R}^p \to \mathbb{R} is defined as

L(x,\lambda,\nu) = f_0(x) + \sum_{i=1}^m \lambda_i f_i(x) + \sum_{i=1}^p \nu_i h_i(x).

The vectors λ and ν are called the dual variables or Lagrange multiplier vectors associated with the problem. The Lagrange dual function g:\mathbb{R}^m \times \mathbb{R}^p \to \mathbb{R} is defined as

g(\lambda,\nu) = \inf_{x\in\mathcal{D}} L(x,\lambda,\nu) = \inf_{x\in\mathcal{D}} \left ( f_0(x) + \sum_{i=1}^m \lambda_i f_i(x) + \sum_{i=1}^p \nu_i h_i(x) \right ).

The dual function g is concave, even when the initial problem is not convex. The dual function yields lower bounds on the optimal value p * of the initial problem; for any \lambda \geq 0 and any ν we have g(\lambda,\nu) \leq p^* . If a constraint qualification such as Slater's condition holds and the original problem is convex, then we have strong duality, i.e. d^* = \max g(\lambda,\nu) = \inf f_0 = p^*(x) .

  1. ^ a b Gluss, David and Weisstein, Eric W., Lagrange Multiplier at MathWorld.

For references to Lagrange's original work and for an account of the terminology see the Lagrange Multiplier entry in

Exposition

For additional text and interactive applets

Advanced Search
Included Web Search Engines


Safe Search

close

Top Matching Results

Occasionally Search.com will highlight specialized results that are based on the context of your query. Examples of specialized results include specific links to news, images, or video.

Top Matching Results may highlight information from other Search.com pages, content from the CNET Network of sites, or third party content. The listings are based purely on relevance. Search.com does not receive payment for listings in this section but our partners that provide this data may get paid for listing these products.

Sponsored Links

This section contains paid listings which have been purchased by companies that want to have their sites appear for specific search terms and related content. These listings are administered, sorted and maintained by a third party and are not endorsed by Search.com.

Search Results

Search.com sends your search query to several search engines at one time and integrates the results into one list which has been sorted by relevance using Search.com's proprietary algorithm. You can customize the list of search engines included in your metasearch from the preferences.

The search engines that are used in your metasearch may allow companies to pay to have their Web sites included within the results. To view the Paid Inclusion policy for a specific search engine, please visit their Web site. Search.com does not accept payment or share revenue with any search engine partner for listings in this section.