Least Squares and Curve Fitting

These notes cover a portion of the material in Chapter 5.4 of our text; my hope is that a somewhat less
thorough treatment may be easier to grasp. You may wish to fill in some of the details by reading section
5.4.

The problem we will use for motivation is to find the straight line that lies closest to a set of data points
in the plane. We will see that this leads directly to a question about solving linear equations. To set this up,
consider a set of points (a₁, b₁), · · · (a_n, b_n) in the plane. Maybe these are measurements from an experiment;
for instance we might be measuring the movement of some particle, and the first coordinate is time, while
the second coordinate is position. We observe in the figures below, that the data seem to lie near a line,
but not all on a line. We’d like to find an equation of a line that approximates the data set, as drawn on
the left. To keep calculations to a minimum , we’ll work with the smaller data set consisting of the points
(0, 1), (2, 4), (4, 5), (6, 7).

First, let’s observe why finding the straight line has anything to do with solving linear equations. We
are looking for a line of the form that goes through all of the points. For each point , we
must have which is a linear equation for the unknowns . So if there were a line going
through all four points on the right- hand graph , the coefficients would satisfy the system

or, in matrix form where

A quick reduction of A to rref shows that there is no solution to this system , which is pretty clear if you
examine the picture–the points just don’t lie on a line. In other terms , we can say that the issue is that the

vector is not in the image of A.

This motivates the more general problem: given an m × n matrix A (giving a linear transformation
) and a vector , find the vector that is closest to being a solution to the equation
What do we mean by ‘closest to being a solution’? If is actually a solution (ie ) then that
would certainly qualify as closest. But, as in the curve fitting problem, there may not be an actual solution.
The answer is provided by the following picture, in which we have written V = Image(A).

We are looking for the closest vector in V to b; the closest vector is given by the projection of b into V .
This looks pretty clear in the picture; the book suggests a geometric proof on pages 213-214. Since
lies in V = Image(A), we can write is as A times some vector. That vector is . So we have, in principle,
achieved our goal: we find our best approximation to a solution to by solving (for ) the equation
We know that there is a solution, because the right hand side is by definition an element
of V . The vector is called a least -squares solution to the equation .

We can develop this a little further, and there are two reasons for doing so. One is that projecting into
V is kind of a pain in the neck, since we’d have to find an orthogonal basis for V in order to use our formula .
Also, it would be nice to have some kind of formula for . We’ll take care of both of these at the same time.
(Obligatory disclaimer: in practice, especially with large systems, it’s often better to just find the ON basis
for V and then solve for by Gaussian elimination. But fortunately we are not worrying about such nasty
details!)

The main fact that leads us to a new solution is the one proved in Monday’s class: a vector w is
perpendicular to V = Image(A) if and only if How does this help? Well, to say that
is equivalent to saying that is perpendicular to V , which is (by what we just observed)
equivalent to The conclusion of all this is that is closest to a solution of if and
only if it is a solution to the equation

The equation (that we solve to get ) is called the normal equation of . Note: we can solve this by
multiplying both sides by although in practice this may not be very easy to compute.

Let’s carry this out for the example we started with. We have which gives

We also need

After some mildly messy arithmetic, we solve the system to get
Going back to the original problem, the line that best fits the data is given by y = 1.4 + .95t. If there were
more data points, then the problem is pretty much the same, except that the matrix A, which was 4 × 2 in
this example, will be k × 2 if there are k data points. If you like formulas, you can find the general solution
to the normal equation at the end of chapter 5.4. You can also find another worked example of how to find
a least-squares solution to an inconsistent equation in example 1 of section 5.4. In this example, the book
uses the fact that we can invert A^TA to solve the normal equation.

There are some other variations on this theme. Suppose, for instance that a plot of our data looks like a
parabola, suggesting that it might fit a quadratic equation. Thus, we would look for a function of the form
and so has 3 variables. For example, the four data points below look like
they might fit a parabola.

Plugging in the given values gives us equations etc. Thus we
look for a good approximation to solutions to where and As
above, we can’t solve the equation directly, so instead we solve the normal equation This is
After some work (I confess, I used a computer), I get or
put another way, the best-fitting parabola is The corresponding parabola is drawn
below (again, done on the computer).

Homework problem 32 in this section leads to a more reasonable normal equation, where you can do the
work by hand.