Maximization of Quadratic Functions
I. Introduction.
A. Quadratic functions are among the simplest, and generally most useful, func-
tions that we might hope to encounter.
B. An understandning of their maximization is certainly a good first step in the
direction of understanding maximization of more general functions.
C. It turns out that the theory of quadratic functions is prototypical, in that
the
main features are preserved or generalized in the general theory.
II. Quadratic Functions.
A. Polynomials.
1. A polynomial in the variables is an
expression of the form
where I is a finite index set and, for each i,
is a real number and
are nonnegative integers.
2. Of course polynomials can be added and multiplied . In fact with these
operations the set of polynomials is a commutative ring with unit.
B. Degree
1. The degree of a monomial is the sum
of the
exponents , and the degree of the polynomial q above is, by definition, the
maximal degree of any of the monomials
2. A polynomial is said to be homogeneous of degree d if
all its monomials
have degree d.
3. Any polynomial p(x) of degree d can be written as a sum
where each is homogeneous of degree i.
4. A quadratic function is a polynomial
of degree 2.
C. A Matrix Representation
1. The general form of a quadratic function is
where
2. An n × n matrix M with entries
is symmetric if it is equal to its
own transpose:
for all 1≤ i; j ≤ n. For any quadratic
function as above, there is a unique symmetric matrix representing the
homogeneous part of degree 2. There are other ways of singling out one
particular matrix from the in finite collection that all represent the same
homogeneous polynomial of degree 2. The advantages of the symmetric
representation will emerge as we go along.
D. Differentiation .
1. We define the derivative Dq(x) of q at x to be the function that assigns
to each
the number
2. We evaluate the difference quotient by computing:
a. The transpose of a 1 ×1 matrix is itself, so
,
and since M is symmetric we have
.
We now
see that
b. Note that Dq(x) is a linear function from IRn to IR.
III. Maximization of Quadratic Functions.
A. The first order condition for maximization states that if
is a maximizer of
q(x), then is a critical point of q, which means that
for all
.
1. If for some v, then we could easily
show that either
or for sufficiently small ε, contradicting
the assumption that is a maximum.
2. If M is invertible, then the only critical point is
.
3. If M is not invertible, then there are either no critical points or infinitely
many critical points.
B. Assuming that the origin is a critical point, we now want to investigate
condi-
tions under which it is a maximum, and when it is the only maximum.
1. The matrix M is positive definite if
for all .
2. The matrix M is positive semidefinite if
for all .
3. The matrix M is negative definite if for
all .
4. The matrix M is negative semidefinite if
for all .
C. We now want to characterize symmetric definite matrices. We will show that,
by changing coordinate systems, we can display such a matrix as a diagonal
matrix, with the signs of the entries on the diagonal corresponding to the
version of definiteness being characterized.
IV. Orthogonal Transformations
A. Let P be an n × n matrix, which we think of as representing a change of
coordinates.
1. We ask for the conditions under which P preserves the inner product.
a. The inner product can be represented as a product of matrixes:
b. For P to preserve the inner product means that for all
,
c. This is the case if and only if P'P = I, i.e.,
.
2. Such a matrix P is said to be an orthogonal matrix. The set of such
matrices is closed under multiplication and inversion, and is therefore a
group , called the orthogonal group, and typically denoted by O(n).
B. Diagonalization of Symmetric Matrices.
1. If x'Mx is a quadratic form with M a symmetric matrix, it is intuitive
that we would not say that the "geometry" of the function had changed
if we change the coordinates used to represent the point x is a way that
preserved the inner product. Let y denote the same point in the new
system of coordinates , and let the two systems of
coordinates be related
by the formula x = Py where P is an n × n orthgonal matrix. We have
so in the new coordinate system the quadratic form is represented by the
matrix P'MP.
Theorem: If M is a symmetric matrix, there is an orthogonal matrix P such that
P'MP
is a diagonal matrix.
Corollary: A symmetric n × n matrix is positive definite (positive semidefinite,
nega-
tive definite, negative semidefinite) if and only if, for any orthogonal matrix P
such that
P0MP is diagonal, all the diagonal entries of P'MP are positive (nonnegative,
negative,
nonpositive).
V. The Characteristic Polynomial.
A. For an n × n matrix M, if Av = λ v for some
and some
, we say
that v is an eigenvector of M while λ is an eigenvalue of M.
1. If λ is an eigenvalue, then M-λI is a singular matrix, and its determinant
is 0.
2. Conversely, if M - λ I is singular, then (M - λI)v = 0 for some v ≠ 0,
and λ is an eigenvalue.
B. The characteristic polynomial of M is the expression
det(M -λ I)
which is a polynomial of degree n.
C. The characteristic polynomial is invariant under coordinate changes: if Q is
any nonsingular matrix, then, since det(AB) = det(A) · det(B) for any n × n
matrices A and B,
so that M and Q-1MQ have the same characteristic polynomial.
D. From this we can conclude that, if M is symmetric, all its eigenvalues are
real.
Prev | Next |