ERROR AND SENSITIVITY ANALYSIS FOR SYSTEMS OF LINEAR EQUATIONS

• Read Sections 2.7, 3.3, 3.4, 3.5.

• Conditioning of linear systems .

• Estimating accuracy

• Error analysis

Perturbation analysis

Consider a linear system Ax = b. The question addressed by
perturbation analysis is to determine the variation of the solution
x when the data, namely A and b, undergoes small variations. A
problem is ill-conditioned if small variations in the data lead to very
large variation in the solution.

Let E, be an n × n matrix and e be an n-vector.

“Perturb” A into A(ε) = A + εE and b into b + εe.

Note: A + εE is nonsingular for ε small enough.

Why?

The solution x (ε) of the perturbed system is s.t.

Let

. Then,

x(ε) is differentiable at ε = 0 and its derivative is

A small variation [εE, εe] will cause the solution to vary by
roughly εx′(0) = εA⁻¹(e − Ex).

The relative variation is such that

Since :

The quantity

is called the condition
number of the linear system with respect to the norm

. When
using the standard norms

, we label κ(A)
with the same label as the associated norm. Thus,

Note:

ratio of largest to
smallest singular values of A . Allows to define

when A is
not square .

Determinant *is not* a good indication of sensitivity

Small eigenvalues *do not* always give a good indication of poor
conditioning.

Example: Consider matrices of the form

for large α. The inverse of

and for the ∞-norm we have

so that

For a large α, this can give a very large condition number, whereas
all the eigenvalues of

are equal to unity.

Rigorous norm-based error bounds

First need to show that A+E is nonsingular if A is nonsingular
and E is small:

LEMMA: If A is nonsingular and

then A + E
is non-singular and

Proof is based on expansion

THEOREM 1: Assume that (A + E)y = b + e and Ax = b
and that

. Then A + E is nonsingular and

roof: From (A + E)y = b + e and Ax = b we get
A(y − x) = e − Ey = e − Ex − E(y − x). Hence:

Note: stated in a slightly weaker form in text. Assume that
and and then

Another common form :

THEOREM 2: Let (A + ΔA)y = b + Δb and Ax = b

where

, and assume that

. Then

Normwise backward error

Question: How much do we need to perturb data for an approxi-
mate solution y to be the exact solution of the perturbed system?

Normwise backward error for y is defined as smallest ε for which

Denoted by

y is given (some computed solution). E and e are to be selected
(most likely ’directions of perturbation for A and b’).

Typical choice: E = A, e = b

Explain why this is not unreasonable

Let r = b − Ax. Then we have:

THEOREM 3:

Normwise backward error is for case E = A, e = b:

Show how this can be used in practice as a means to stop some
iterative method which computes a sequence of approximate solutions
to Ax = b.

Consider the 6×6 Vandermonde system Ax = b where

. We perturb A by E, with
|E| ≤ 10⁻¹⁰|A| and b similarly and solve the system. Evaluate
the backward error for this case. Evaluate the forward bound provided
by Theorem 2. Comment on the results.

Componentwise backward error

A few more definitions on norms...

A norm is absolute

for all x. (satisfied by all
p-norms).

A norm is monotone if

It can be shown that these two properties are equivalent.
... and some notation:

(where E ≥ 0, e ≥ 0) is the componentwise backward error.

Show: a function which satisfies the first 2 requirements of vector
norms (1.

(==0, iff x = 0) and 2.

satisfies the triangle inequality iff its unit ball is convex.

Use the above to construct a norm in R² that is
*not* absolute.

Define absolute *matrix* norms in the same way. Which of the
usual norms

, and

are absolute?

Recall that for any matrix fl(A) = A+E with

.
For an absolute matrix norm

What does this imply?

THEOREM 4 [Oettli-Prager] Let r = b − Ay (residual). Then

	zero denominator case :
	0/0 → 0
	nonzero/ 0 → ∞

Analogue of theorem 2:

THEOREM 5 Let Ax = b and (ΔA + A)y = b + Δb where
| ΔA| ≤ εE and | Δb| ≤ εe. Assume that ,
where is an absolute norm. Then,

In addition, equality achieved to order ε for infinity norm.

Implication:

is equal to

Condition number depends on x (i.e. on right-hand side b)

Special case E = |A|, e = 0 yields

Componentwise condition number :

Redo example seen after Theorem 3, (6 × 6 Vandermonde
system) using componentwise analysis.

Example of ill-conditioning: The Hilbert Matrix

Notorious example of ill conditioning.

For .

Let .

Solution of .

Let n = 5 and perturb into 0.20001.

New solution: (0.9937, 1.1252, 0.4365, 1.865, 0.5618)^T

Estimating Condition numbers. Let A,B be two n × n
matrices with A nonsingular and B singular. Then

Proof: B singular →:

x ≠ 0 such that Bx = 0.

Divide both sides by

result.
QED.

Example:
let

and

Then

Estimating errors from residual norms

Let

an approximate solution to system Ax = b (e.g., computed
from an iterative process). We can compute the residual norm:

Question: How to estimate the error

from

One option is to use the inequality

We must have an estimate of κ(A).

Proof of inequality.

First, note that

. So:

Also note that from the relation b = Ax, we get

Therefore,

Show that

THEOREM 6 Let A be a nonsingular matrix and

an approximate
solution to Ax = b. Then for any norm

In addition, we have the relation

in which κ(A) is the condition number of A associated with the
norm

It can be shown that

Iterative refinement

Define residual vector:

We have seen that:

, i.e., we have

Idea: Compute r accurately (double precision) then solve

... and correct

... repeat if needed.

ALGORITHM : 1 Iterative refinement
Do {
1. Compute r = b − A

2. Solve

3. Compute

} while

Why does this work? Model: each solution gets m digits at most
because of the conditioning: For example 3 digits. At the first
iteration, the error is roughly

Second iteration: error in

is roughly

. (but now

is much smaller than

).

etc ..

Iterative refinement - Analysis

Assume residual is computed exactly. Backward error analysis:

So:

A previous result showed that if

then

So : process will converge if (suff. condition)

Important: Iterative refinement won’t work when the residual r
consists only of noise. When b − Ax is already very small (≈ ε) it
is likely to be just noise, so not much can be done because

Heuristic: If ε = 10^−d, and

then each iterative
refinement step will gain about d − q digits.

A matlab experiment [do operations indicated ]

repeat a couple of times..

Observation: we gain about 3 digits per iteration.

Read Section 3.5.4 on estimating accuracy.