Usually there will be no set of
xi which exactly satisfies (1).
Let us define an error vector ej by
(2)
It simplifies the development to rewrite this equation as follows
(a trick I learned from John P. Berg).
(3)
We may abbreviate this equation as
(4)
where B is the matrix containing c and a.
The ith error may be written
as a dot product and either vector may be written as the column
Now we will minimize the sum squared error
E defined as
(5)
The summation may be brought inside the constants
(6)
The matrix in the center,
call it rij,
is symmetrical.
It is a positive (more strictly, nonnegative)
definite matrix because you will never be able
to find an x for which E is negative,
since E is a sum of squared ei.
We find the x with minimum E by requiring
Notice that this will give us exactly one
equation for each unknown.
In order to clarify the presentation we will
specialize (6) to two unknowns.
(7)
Setting to zero the derivative with respect to x1, we get
(8)
Since rij = rji,
both terms on the right are equal.
Thus (8) may be written
(9)
Likewise,
differentiating with respect to x2 gives
(10)
Equations (9) and (10) may be combined
(11)
This form is two equations in two unknowns.
One might write it in the more conventional form
(12)
The matrix of (11) lacks
only a top row to be equal to the matrix
of (7).
To give it that row,
we may augment (11) by
(13)
where (13) may be regarded
as a definition of a new variable v.
Putting (13) on top of (11) we get
(14)
The solution x of (12)
or (14) is that set of
xk for which E is a minimum.
To get an interpretation of v, we may
multiply both sides by ,
getting
(15)
Comparing (15) with (7),
we see that v is the minimum value of E.
Occasionally,
it is more convenient to have the essential equations in
partitioned matrix form.
In partitioned matrix form,
we have for the error (6)
(16)
The final equation (14) splits into
(17)
(18)
where (18) represents
simultaneous equations to be solved for x.
Equation (18) is what you have to set up in a computer.
It is easily remembered by a quick and dirty
(very dirty) derivation.
That is,
we began with the overdetermined equations
;premultiplying by
gives
which is (18).
In physical science applications,
the variable zj is frequently a complex
variable, say zj = xj + iyj.
It is always possible to go through the
foregoing analyses,
treating the problem as though xi and yi were
real independent variables.
There is a considerable gain in simplicity and a
saving in computational effort
by treating zj as a single complex variable.
The error E may be regarded
as a function of either xj and yj or
zj and .
In general but we will
treat the case N = 1 here
and leave the general case for the Exercises.
The minimum is found where
(19)
(20)
Multiplying (20) by i
and adding and subtracting these equations,
we may express the minimum condition more simply as
(21)
(22)
However,
the usual case is that E is a positive real quadratic function of
z and and that
is merely the complex
conjugate of .
Then the two conditions
(21) and (22)
may be replaced by either one of them.
Usually,
when working with complex variables we are minimizing a positive
quadratic form like
(23)
where * denotes complex-conjugate transpose.
Now (22) gives
(24)
which is just the complex form of (18).
Let us consider an example.
Suppose a set of wave arrival times ti is
measured at sensors located on the x axis at points xi.
Suppose the wavefront is to be fitted to a
parabola .Here,
the xi are knowns and a, b, and c are unknowns.
For each sensor i we have an equation
(25)
When i has greater range than 3
we have more equations than unknowns.
In this example, (14) takes the form
(26)
This may be solved by standard methods for a, b, and c.