53  Numerical linear algebra

Solving systems that can’t be solved by hand

You have a system of equations. In principle, Gaussian elimination gives the answer. In practice, you type it into a computer, and the answer is wrong — not obviously wrong, but subtly, fatally wrong.

The issue isn’t the algorithm. The issue is that computers don’t do exact arithmetic. Every floating-point operation introduces a tiny rounding error. Tiny errors compound. In a badly conditioned system, a rounding error in the twelfth decimal place can shift the answer by a factor of a thousand.

This chapter is about how to do it right: pivoting to avoid catastrophic cancellation, LU decomposition to separate the expensive part from the cheap part, and the condition number to know in advance whether the answer you’re computing is worth trusting.

53.1 What the notation is saying

A linear system \(A\mathbf{x} = \mathbf{b}\) means: find the vector \(\mathbf{x}\) — read as “the solution vector” — such that the matrix \(A\), read as “the coefficient matrix”, transforms \(\mathbf{x}\) into the right-hand side \(\mathbf{b}\).

In component form for a \(3 \times 3\) system:

\[\begin{pmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix} = \begin{pmatrix} b_1 \\ b_2 \\ b_3 \end{pmatrix}\]

The goal is to reduce \(A\) to upper-triangular form so that back-substitution can extract \(x_3\), then \(x_2\), then \(x_1\) in sequence.

LU decomposition writes \(A = LU\), where \(L\) is a lower-triangular matrix (ones on the diagonal, entries only below) and \(U\) is the upper-triangular matrix produced by elimination. This notation should be read as: “\(A\) factors into \(L\) times \(U\)”.

The condition number \(\kappa(A)\), read “kappa of \(A\)”, quantifies how much the solution can change relative to a small change in the data. It is defined as

\[\kappa(A) = \|A\|_\infty \cdot \|A^{-1}\|_\infty\]

where \(\|A\|_\infty\), read “the infinity-norm of \(A\)”, is the largest row sum of absolute values:

\[\|A\|_\infty = \max_i \sum_j |a_{ij}|\]

A rough rule: if \(\kappa(A) \approx 10^k\), the solution loses approximately \(k\) digits of accuracy to rounding errors. A condition number of \(10^{12}\) on a machine with 15-digit floating-point precision leaves you with about 3 reliable digits.

For iterative methods, \(x^{(k)}\) denotes the iterate at step \(k\) — the approximate solution after \(k\) steps, converging toward the true \(\mathbf{x}\) as \(k\) grows.

53.2 Gaussian elimination with partial pivoting

53.2.1 Why naive elimination fails

The exact solution of the system below is \(x_1 = 1\), \(x_2 = 1\) — you can check: \(0.0001(1) + 1(1) = 1.0001 \approx 1\) and \(1(1) + 1(1) = 2\). Now watch what happens when a computer solves it.

Consider this \(2 \times 2\) system in floating-point arithmetic with 4 significant figures:

\[\begin{pmatrix} 0.0001 & 1 \\ 1 & 1 \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \end{pmatrix} = \begin{pmatrix} 1 \\ 2 \end{pmatrix}\]

Naive elimination: subtract \((1 / 0.0001) = 10000\) times row 1 from row 2.

Row 2 becomes: \[[1 - 10000 \times 0.0001,\quad 1 - 10000 \times 1] = [0,\quad 1 - 10000]\]

The \(1 - 10000\) gives \(-9999\), fine. But consider the right-hand side: \(2 - 10000 \times 1 = -9998\). Back-substituting: \(x_2 = -9998 / {-9999} \approx 1.0001\). Then \(x_1 = (1 - 1 \times 1.0001)/0.0001 \approx -1\).

The exact answer is \(x_2 \approx 1\), \(x_1 \approx 1\). Naive elimination gives \(x_1 \approx -1\). That is not a small rounding error — that is the wrong sign with the wrong magnitude. The problem: dividing by the tiny pivot \(0.0001\) amplified the rounding error in the subtracted row by a factor of \(10000\). The error in the fourth decimal place became the dominant term in the result.

This is catastrophic cancellation: subtracting two nearly equal large numbers, leaving a result that is almost entirely rounding error. It is not a pathological edge case. Any system with a small pivot relative to the other entries is vulnerable — and such systems arise routinely from physical models with variables at very different scales.

53.2.2 Partial pivoting: the fix

Before eliminating column \(j\), scan the entries in column \(j\) from row \(j\) downward. Swap the row with the largest absolute value in that column into the pivot position. This ensures the multiplier \(m_{ij} = a_{ij}/a_{jj}\) satisfies \(|m_{ij}| \leq 1\), bounding the growth of rounding errors.

Algorithm (partial pivoting GE on \(n \times n\) augmented matrix \([A|\mathbf{b}]\)):

For \(j = 1, 2, \ldots, n-1\):

  1. Find \(p = \arg\max_{i \geq j} |a_{ij}|\) (index of largest entry in column \(j\), rows \(j\) to \(n\))
  2. If \(p \neq j\): swap rows \(p\) and \(j\) in \([A|\mathbf{b}]\)
  3. For \(i = j+1, \ldots, n\):
    • Compute multiplier \(m_{ij} = a_{ij}/a_{jj}\)
    • Row \(i \leftarrow\) Row \(i\) \(- m_{ij} \times\) Row \(j\)

After \(n-1\) passes, \(A\) is upper triangular. Recover \(\mathbf{x}\) by back-substitution:

\[x_n = \frac{b_n}{a_{nn}}, \qquad x_i = \frac{1}{a_{ii}}\left(b_i - \sum_{k=i+1}^n a_{ik} x_k\right), \quad i = n-1, \ldots, 1\]

53.2.3 Worked example: partial pivoting on a 3×3 system

Solve:

\[\begin{pmatrix} 1 & 2 & 1 \\ 3 & 2 & 4 \\ 2 & 1 & 2 \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix} = \begin{pmatrix} 8 \\ 20 \\ 12 \end{pmatrix}\]

Write the augmented matrix:

\[[A|\mathbf{b}] = \left[\begin{array}{ccc|c} 1 & 2 & 1 & 8 \\ 3 & 2 & 4 & 20 \\ 2 & 1 & 2 & 12 \end{array}\right]\]

Pass 1 (column 1): The largest entry in column 1 is \(|3|\) in row 2. Swap rows 1 and 2:

\[\left[\begin{array}{ccc|c} 3 & 2 & 4 & 20 \\ 1 & 2 & 1 & 8 \\ 2 & 1 & 2 & 12 \end{array}\right]\]

Eliminate below pivot \(a_{11} = 3\):

  • Row 2 \(\leftarrow\) Row 2 \(- (1/3)\) Row 1: \([1 - 1, \; 2 - 2/3, \; 1 - 4/3 \;|\; 8 - 20/3] = [0, \; 4/3, \; -1/3 \;|\; 4/3]\)
  • Row 3 \(\leftarrow\) Row 3 \(- (2/3)\) Row 1: \([2 - 2, \; 1 - 4/3, \; 2 - 8/3 \;|\; 12 - 40/3] = [0, \; -1/3, \; -2/3 \;|\; -4/3]\)

\[\left[\begin{array}{ccc|c} 3 & 2 & 4 & 20 \\ 0 & 4/3 & -1/3 & 4/3 \\ 0 & -1/3 & -2/3 & -4/3 \end{array}\right]\]

Pass 2 (column 2): Largest in column 2, rows 2–3: \(|4/3| > |-1/3|\). No swap needed. Pivot is \(4/3\).

  • Row 3 \(\leftarrow\) Row 3 \(- (-1/3)/(4/3) \times\) Row 2 \(=\) Row 3 \(+ (1/4)\) Row 2: \([0, \; -1/3 + 1/3, \; -2/3 - 1/12 \;|\; -4/3 + 1/3] = [0, \; 0, \; -3/4 \;|\; -1]\)

\[\left[\begin{array}{ccc|c} 3 & 2 & 4 & 20 \\ 0 & 4/3 & -1/3 & 4/3 \\ 0 & 0 & -3/4 & -1 \end{array}\right]\]

Back-substitution:

\[x_3 = \frac{-1}{-3/4} = \frac{4}{3}\]

\[x_2 = \frac{4/3 - (-1/3)(4/3)}{4/3} = \frac{4/3 + 4/9}{4/3} = \frac{16/9}{4/3} = \frac{16}{9} \cdot \frac{3}{4} = \frac{4}{3}\]

\[x_1 = \frac{20 - 2(4/3) - 4(4/3)}{3} = \frac{20 - 8/3 - 16/3}{3} = \frac{20 - 8}{3} = \frac{12}{3} = 4\]

Solution: \(x_1 = 4\), \(x_2 = 4/3\), \(x_3 = 4/3\).

Check: Row 1: \(1(4) + 2(4/3) + 1(4/3) = 4 + 8/3 + 4/3 = 4 + 4 = 8\) ✓ Row 2: \(3(4) + 2(4/3) + 4(4/3) = 12 + 8/3 + 16/3 = 12 + 8 = 20\) ✓ Row 3: \(2(4) + 1(4/3) + 2(4/3) = 8 + 4/3 + 8/3 = 8 + 4 = 12\)

53.3 LU decomposition

Gaussian elimination — without pivoting, for clarity — can be read as computing a factorisation \(A = LU\). The matrix \(L\) records the multipliers used during elimination; \(U\) is the upper-triangular result.

53.3.1 Doolittle form

In the Doolittle convention, \(L\) has ones on its diagonal:

\[L = \begin{pmatrix} 1 & 0 & 0 \\ m_{21} & 1 & 0 \\ m_{31} & m_{32} & 1 \end{pmatrix}, \qquad U = \begin{pmatrix} u_{11} & u_{12} & u_{13} \\ 0 & u_{22} & u_{23} \\ 0 & 0 & u_{33} \end{pmatrix}\]

where \(m_{ij}\) is the multiplier used to zero \(a_{ij}\) during elimination.

The \(U\) matrix is exactly the upper-triangular matrix at the end of Gaussian elimination. The \(L\) matrix costs nothing extra — the multipliers are computed anyway and simply stored in the lower portion of the working array.

53.3.2 Why LU earns its keep

Solving \(A\mathbf{x} = \mathbf{b}\) once costs \(O(n^3)\) operations — where “operations” means floating-point additions and multiplications, roughly \(n^3/3\) in total, because each elimination step touches every remaining row-column pair. This is unavoidable. But if you need to solve the same \(A\) for multiple right-hand sides \(\mathbf{b}_1, \mathbf{b}_2, \ldots\), the LU factorisation pays back:

  • Build \(A = LU\): \(O(n^3)\) — done once.
  • Solve \(L\mathbf{y} = \mathbf{b}\) (forward substitution): \(O(n^2)\) per right-hand side.
  • Solve \(U\mathbf{x} = \mathbf{y}\) (back substitution): \(O(n^2)\) per right-hand side.

For \(k\) right-hand sides the total cost is \(O(n^3) + k \cdot O(n^2)\), versus \(k \cdot O(n^3)\) if you re-eliminate each time. For a \(1000 \times 1000\) system solved for 50 right-hand sides, that’s roughly \(50 \times\) faster.

This matters in FEM analysis, where the stiffness matrix \(K\) is fixed while the load vector \(\mathbf{f}\) changes with each load case.

53.3.3 Worked example: Doolittle LU on a 3×3 system

Factorise and solve:

\[A = \begin{pmatrix} 2 & 1 & 1 \\ 4 & 3 & 3 \\ 8 & 7 & 9 \end{pmatrix}, \quad \mathbf{b}_1 = \begin{pmatrix} 1 \\ 1 \\ 1 \end{pmatrix}, \quad \mathbf{b}_2 = \begin{pmatrix} 2 \\ 0 \\ 4 \end{pmatrix}\]

Step 1: Build \(U\) and record multipliers.

Column 1, pivot \(u_{11} = 2\): - \(m_{21} = 4/2 = 2\). Row 2 \(\leftarrow\) Row 2 \(-\) 2 Row 1: \([0, 1, 1 \;|\; \cdots]\) - \(m_{31} = 8/2 = 4\). Row 3 \(\leftarrow\) Row 3 \(-\) 4 Row 1: \([0, 3, 5 \;|\; \cdots]\)

Column 2, pivot \(u_{22} = 1\): - \(m_{32} = 3/1 = 3\). Row 3 \(\leftarrow\) Row 3 \(-\) 3 Row 2: \([0, 0, 2 \;|\; \cdots]\)

\[U = \begin{pmatrix} 2 & 1 & 1 \\ 0 & 1 & 1 \\ 0 & 0 & 2 \end{pmatrix}, \qquad L = \begin{pmatrix} 1 & 0 & 0 \\ 2 & 1 & 0 \\ 4 & 3 & 1 \end{pmatrix}\]

Verify: \(LU\) should equal \(A\).

\[LU = \begin{pmatrix}1&0&0\\2&1&0\\4&3&1\end{pmatrix}\begin{pmatrix}2&1&1\\0&1&1\\0&0&2\end{pmatrix} = \begin{pmatrix}2&1&1\\4&3&3\\8&7&9\end{pmatrix} = A \checkmark\]

Step 2: Solve \(L\mathbf{y} = \mathbf{b}_1\) (forward substitution).

\[\begin{pmatrix}1&0&0\\2&1&0\\4&3&1\end{pmatrix}\begin{pmatrix}y_1\\y_2\\y_3\end{pmatrix} = \begin{pmatrix}1\\1\\1\end{pmatrix}\]

\(y_1 = 1\), \(\quad y_2 = 1 - 2(1) = -1\), \(\quad y_3 = 1 - 4(1) - 3(-1) = 0\)

Step 3: Solve \(U\mathbf{x} = \mathbf{y}\) (back substitution).

\[\begin{pmatrix}2&1&1\\0&1&1\\0&0&2\end{pmatrix}\begin{pmatrix}x_1\\x_2\\x_3\end{pmatrix} = \begin{pmatrix}1\\-1\\0\end{pmatrix}\]

\(x_3 = 0/2 = 0\), \(\quad x_2 = (-1 - 1 \cdot 0)/1 = -1\), \(\quad x_1 = (1 - 1(-1) - 1(0))/2 = 1\)

Solution for \(\mathbf{b}_1\): \(\mathbf{x} = (1, -1, 0)^T\)

Check: \(2(1) + 1(-1) + 1(0) = 1\) ✓, \(\quad 4(1) + 3(-1) + 3(0) = 1\) ✓, \(\quad 8(1) + 7(-1) + 9(0) = 1\)

Step 4: Reuse \(L\), \(U\) for \(\mathbf{b}_2 = (2, 0, 4)^T\).

Forward substitution \(L\mathbf{y} = \mathbf{b}_2\):

\(y_1 = 2\), \(\quad y_2 = 0 - 2(2) = -4\), \(\quad y_3 = 4 - 4(2) - 3(-4) = 4 - 8 + 12 = 8\)

Back substitution \(U\mathbf{x} = \mathbf{y}\):

\(x_3 = 8/2 = 4\), \(\quad x_2 = (-4 - 1 \cdot 4)/1 = -8\), \(\quad x_1 = (2 - 1(-8) - 1(4))/2 = (2 + 8 - 4)/2 = 3\)

Solution for \(\mathbf{b}_2\): \(\mathbf{x} = (3, -8, 4)^T\)

Check: \(2(3) + 1(-8) + 1(4) = 6 - 8 + 4 = 2\) ✓, \(\quad 4(3) + 3(-8) + 3(4) = 12 - 24 + 12 = 0\) ✓, \(\quad 8(3) + 7(-8) + 9(4) = 24 - 56 + 36 = 4\)

Why LU works

Solving \(A\mathbf{x} = \mathbf{b}\) is equivalent to solving \(LU\mathbf{x} = \mathbf{b}\). Let \(\mathbf{y} = U\mathbf{x}\). Then \(L\mathbf{y} = \mathbf{b}\) — a lower-triangular system, solved in one forward pass. Then \(U\mathbf{x} = \mathbf{y}\) — an upper-triangular system, solved in one backward pass. The original \(O(n^3)\) work goes into building \(L\) and \(U\). Once those are in hand, each new right-hand side costs only \(O(n^2)\).

53.4 Condition number

53.4.1 Definition

The condition number \(\kappa(A)\) measures how much relative error in \(\mathbf{b}\) is amplified into relative error in \(\mathbf{x}\).

Using the \(\infty\)-norm:

\[\kappa(A) = \|A\|_\infty \cdot \|A^{-1}\|_\infty\]

where \(\|A\|_\infty = \max_i \sum_j |a_{ij}|\) is the maximum absolute row sum.

A system is well-conditioned if \(\kappa(A)\) is small (close to 1). It is ill-conditioned if \(\kappa(A) \gg 1\).

The digits-of-accuracy rule: with \(t\)-digit floating-point arithmetic, you can expect roughly \(t - \log_{10}\kappa(A)\) correct digits in the solution.

53.4.2 Ill-conditioning in practice: the 3×3 Hilbert matrix

The Hilbert matrix \(H\) with entries \(h_{ij} = 1/(i+j-1)\) is the classic ill-conditioning example:

\[H_3 = \begin{pmatrix} 1 & 1/2 & 1/3 \\ 1/2 & 1/3 & 1/4 \\ 1/3 & 1/4 & 1/5 \end{pmatrix}\]

Step 1: \(\|H_3\|_\infty\).

Row sums: Row 1: \(1 + 0.5 + 0.333 = 1.833\); Row 2: \(0.5 + 0.333 + 0.25 = 1.083\); Row 3: \(0.333 + 0.25 + 0.2 = 0.783\).

\[\|H_3\|_\infty = 1.833\]

Step 2: \(\|H_3^{-1}\|_\infty\).

The exact inverse of \(H_3\) is:

\[H_3^{-1} = \begin{pmatrix} 9 & -36 & 30 \\ -36 & 192 & -180 \\ 30 & -180 & 180 \end{pmatrix}\]

Row sums of \(|H_3^{-1}|\): Row 1: \(9 + 36 + 30 = 75\); Row 2: \(36 + 192 + 180 = 408\); Row 3: \(30 + 180 + 180 = 390\).

\[\|H_3^{-1}\|_\infty = 408\]

Step 3: \(\kappa_\infty(H_3)\).

\[\kappa_\infty(H_3) = 1.833 \times 408 \approx 748\]

So solving \(H_3 \mathbf{x} = \mathbf{b}\) loses about \(\log_{10}(748) \approx 2.9\) digits of accuracy. On a 15-digit machine you get about 12 reliable digits — acceptable. For \(H_{10}\), \(\kappa \approx 10^{13}\), leaving roughly 2 reliable digits. That system cannot be usefully solved by direct methods in standard double precision.

When to worry

If \(\kappa(A) > 10^6\) in double precision, inspect the problem formulation before trusting any solution. The issue usually lies in the model (ill-scaled variables, near-dependent equations) rather than the solver. Rescaling rows and columns (equilibration) often reduces \(\kappa\) by several orders of magnitude.

53.5 Iterative methods

Direct methods (GE, LU) compute the exact answer (up to rounding) in a fixed number of operations. Iterative methods start from an initial guess and update it repeatedly. They’re preferred when \(A\) is large and sparse — a \(10^6 \times 10^6\) FEM stiffness matrix, for example, where storing \(L\) and \(U\) in full is impractical.

53.5.1 Jacobi iteration

Rewrite each equation for its diagonal variable, holding all other variables fixed at their current values.

For row \(i\) of \(A\mathbf{x} = \mathbf{b}\):

\[x_i^{(k+1)} = \frac{1}{a_{ii}}\left(b_i - \sum_{j \neq i} a_{ij} x_j^{(k)}\right)\]

All updates use values from the previous iterate \(x^{(k)}\). This makes Jacobi naturally parallelisable — each new \(x_i^{(k+1)}\) is independent of the others being computed in the same step.

53.5.2 Gauss-Seidel iteration

The same formula, but use the latest available values — if \(x_j^{(k+1)}\) has already been computed in this sweep, use it immediately:

\[x_i^{(k+1)} = \frac{1}{a_{ii}}\left(b_i - \sum_{j < i} a_{ij} x_j^{(k+1)} - \sum_{j > i} a_{ij} x_j^{(k)}\right)\]

Gauss-Seidel typically converges in roughly half the iterations of Jacobi because it incorporates fresh information sooner. The trade-off: it cannot be parallelised as straightforwardly.

53.5.3 Convergence criterion: diagonal dominance

Both methods are guaranteed to converge if \(A\) is strictly diagonally dominant:

\[|a_{ii}| > \sum_{j \neq i} |a_{ij}| \quad \text{for all } i\]

This means each diagonal entry exceeds the sum of all off-diagonal entries in its row, in absolute value. FEM stiffness matrices and many physical systems (heat diffusion, network flow) naturally satisfy this condition.

When diagonal dominance holds, the error is guaranteed to shrink at each step and the iteration converges from any starting point. (For those familiar with eigenvalues: the convergence criterion is that the spectral radius of the iteration matrix — the largest absolute eigenvalue of the matrix that multiplies the error — is less than 1.)

53.5.4 Worked example: Jacobi and Gauss-Seidel on a 3×3 system

Solve from \(\mathbf{x}^{(0)} = (0, 0, 0)^T\):

\[\begin{pmatrix} 10 & -1 & 2 \\ -1 & 11 & -1 \\ 2 & -1 & 10 \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix} = \begin{pmatrix} 6 \\ 25 \\ -11 \end{pmatrix}\]

Check diagonal dominance: Row 1: \(10 > |-1| + |2| = 3\) ✓; Row 2: \(11 > |-1| + |-1| = 2\) ✓; Row 3: \(10 > |2| + |-1| = 3\) ✓. Convergence guaranteed.

The exact solution is \(\mathbf{x} = (1, 2, -1)^T\).

53.5.4.1 Jacobi, 3 iterations

Iteration formulas:

\[x_1^{(k+1)} = \frac{1}{10}(6 + x_2^{(k)} - 2x_3^{(k)})\]

\[x_2^{(k+1)} = \frac{1}{11}(25 + x_1^{(k)} + x_3^{(k)})\]

\[x_3^{(k+1)} = \frac{1}{10}(-11 - 2x_1^{(k)} + x_2^{(k)})\]

\(k=0 \to k=1\): Using \(\mathbf{x}^{(0)} = (0, 0, 0)^T\):

\(x_1^{(1)} = (6 + 0 - 0)/10 = 0.6\)

\(x_2^{(1)} = (25 + 0 + 0)/11 \approx 2.2727\)

\(x_3^{(1)} = (-11 - 0 + 0)/10 = -1.1\)

\(k=1 \to k=2\):

\(x_1^{(2)} = (6 + 2.2727 - 2(-1.1))/10 = (6 + 2.2727 + 2.2)/10 = 10.4727/10 = 1.0473\)

\(x_2^{(2)} = (25 + 0.6 + (-1.1))/11 = 24.5/11 = 2.2273\)

\(x_3^{(2)} = (-11 - 2(0.6) + 2.2727)/10 = (-11 - 1.2 + 2.2727)/10 = -9.9273/10 = -0.9927\)

\(k=2 \to k=3\):

\(x_1^{(3)} = (6 + 2.2273 - 2(-0.9927))/10 = (6 + 2.2273 + 1.9854)/10 = 10.2127/10 = 1.0213\)

\(x_2^{(3)} = (25 + 1.0473 + (-0.9927))/11 = (25 + 0.0546)/11 = 25.0546/11 \approx 2.278\)

\(x_3^{(3)} = (-11 - 2(1.0473) + 2.2273)/10 = (-11 - 2.0946 + 2.2273)/10 = -10.8673/10 = -1.087\)

After 3 iterations: \(\mathbf{x}^{(3)} \approx (1.021, 2.278, -1.087)^T\). Errors are shrinking — heading toward the exact solution \((1, 2, -1)^T\), but Jacobi converges slowly on this system.

53.5.4.2 Gauss-Seidel, 3 iterations

\(k=0 \to k=1\): Using updated values as soon as available:

\(x_1^{(1)} = (6 + 0 - 0)/10 = 0.6\) (uses \(x_2^{(0)} = 0\), \(x_3^{(0)} = 0\))

\(x_2^{(1)} = (25 + \mathbf{0.6} + 0)/11 = 25.6/11 \approx 2.3273\) (uses new \(x_1^{(1)} = 0.6\))

\(x_3^{(1)} = (-11 - 2(\mathbf{0.6}) + \mathbf{2.3273})/10 = (-11 - 1.2 + 2.3273)/10 = -9.8727/10 = -0.9873\) (uses new \(x_1^{(1)}\) and \(x_2^{(1)}\))

\(k=1 \to k=2\):

\(x_1^{(2)} = (6 + 2.3273 - 2(-0.9873))/10 = (6 + 2.3273 + 1.9746)/10 = 10.3019/10 = 1.0302\)

\(x_2^{(2)} = (25 + 1.0302 + (-0.9873))/11 = 25.0429/11 \approx 2.2766\)

\(x_3^{(2)} = (-11 - 2(1.0302) + 2.2766)/10 = (-11 - 2.0604 + 2.2766)/10 = -10.7838/10 = -1.0784\)

\(k=2 \to k=3\):

\(x_1^{(3)} = (6 + 2.2766 - 2(-1.0784))/10 = (6 + 2.2766 + 2.1568)/10 = 10.4334/10 = 1.0433\)

\(x_2^{(3)} = (25 + 1.0433 + (-1.0784))/11 = 24.9649/11 \approx 2.2695\)

\(x_3^{(3)} = (-11 - 2(1.0433) + 2.2695)/10 = (-11 - 2.0866 + 2.2695)/10 = -10.8171/10 = -1.0817\)

After 3 iterations, Gauss-Seidel is at \(\mathbf{x}^{(3)} \approx (1.043, 2.270, -1.082)^T\). Errors are still about \(4\%\), similar to Jacobi after 3 steps on this particular system. Gauss-Seidel does converge in roughly half the iterations of Jacobi in general, but the advantage shows more clearly in systems where the off-diagonal coupling is stronger or after more iterations have been taken.

When to use iterative vs direct

Criterion Direct (LU) Iterative (GS/Jacobi)
Matrix size Any, up to memory Large sparse matrices
Sparsity Not exploited Natural fit
Multiple RHS Fast after factorisation Re-iterate each time
Accuracy Machine precision Stops at tolerance
Convergence guarantee Always (if non-singular) Diagonal dominance

For FEM systems with \(n > 10^5\), iterative methods are standard. For \(n < 10^3\) with multiple right-hand sides, LU with pivoting is preferable.

53.6 Where this goes

This chapter handles \(A\mathbf{x} = \mathbf{b}\) for fixed \(A\) and \(\mathbf{b}\). The next chapter in this section extends the same ideas to numerical ODEs and PDEs: discretising a differential equation produces a linear system at every time step or on every mesh — the same pivoted LU or iterative machinery applies, just called millions of times. The structure of the system (banded, sparse, symmetric) determines which variant to use, and the condition number of the discretised operator determines the mesh resolution needed to stay in the accurate regime.

The conditioning ideas carry directly into optimisation (Vol 7, optimisation section): the Hessian matrix of an objective function plays the role of \(A\), and ill-conditioning of the Hessian is the primary reason gradient-descent methods stall or diverge. Knowing how to estimate \(\kappa\) and how to precondition a matrix (rescaling so that \(\kappa\) drops) is practical currency in large-scale numerical optimisation.

Where this shows up

  • Structural engineering: Every finite element analysis assembles a stiffness matrix \(K\) and solves \(K\mathbf{u} = \mathbf{f}\) for displacements \(\mathbf{u}\). LU factorisation with multiple load vectors is standard.
  • Computational fluid dynamics: The pressure-correction step in incompressible flow solvers is a large sparse Poisson system — iterative solvers (conjugate gradient, a Krylov subspace method that exploits symmetry and positive-definiteness) are universal.
  • Machine learning: Linear regression solves the normal equations \((X^TX)\boldsymbol{\beta} = X^T\mathbf{y}\); the condition number of \(X^TX\) is why multicollinearity ruins regression estimates.
  • Finance: Estimating a covariance matrix from historical returns produces an ill-conditioned system when assets are correlated — the condition number flags when the inverse is unreliable.

53.7 Exercises

These are engineering-layer puzzles. Each has a specific numerical answer. Work the full procedure — don’t skip steps — because the errors that matter in practice happen in the steps that get skipped in rough work.

Exercise 1. Factorise the following matrix using Doolittle LU decomposition, then use the factorisation to solve for two right-hand sides.

\[A = \begin{pmatrix} 2 & 4 & 2 \\ 1 & 5 & 2 \\ 4 & 6 & 8 \end{pmatrix}, \quad \mathbf{b}_1 = \begin{pmatrix} 2 \\ 3 \\ 6 \end{pmatrix}, \quad \mathbf{b}_2 = \begin{pmatrix} 0 \\ 1 \\ -2 \end{pmatrix}\]

Exercise 2. Apply partial pivoting to the following near-singular system. Identify which row swap is required, perform the swap, and carry out elimination. Show what would go wrong without the swap.

\[\begin{pmatrix} 0.002 & 1 & 1 \\ 1 & 2 & 5 \\ 1 & -1 & 3 \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix} = \begin{pmatrix} 1 \\ 8 \\ 5 \end{pmatrix}\]

Exercise 3. Compute \(\kappa_\infty(A)\) for the matrix below, then interpret the result: if you solve \(A\mathbf{x} = \mathbf{b}\) on a machine with 15 significant digits, how many reliable digits do you expect in the solution?

\[A = \begin{pmatrix} 3 & 1 \\ 2 & 4 \end{pmatrix}\]

Exercise 4. Apply 3 iterations of Jacobi iteration to the following system, starting from \(\mathbf{x}^{(0)} = (0, 0, 0)^T\). Show the iterate after each step.

\[\begin{pmatrix} 5 & 1 & 1 \\ 1 & 6 & 1 \\ 1 & 1 & 4 \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix} = \begin{pmatrix} 7 \\ 8 \\ 6 \end{pmatrix}\]

The exact solution is \(\mathbf{x} = (1, 1, 1)^T\).

Exercise 5. Apply 3 iterations of Gauss-Seidel to the same system as Exercise 4, from the same starting point \(\mathbf{x}^{(0)} = (0, 0, 0)^T\). Compare the convergence rate after 3 iterations.

Exercise 6 (Engineering context). A four-node plane truss is modelled by finite element analysis. After assembly and boundary conditions, the reduced stiffness system is:

\[K = \begin{pmatrix} 4 & -1 & 0 & 0 \\ -1 & 4 & -1 & 0 \\ 0 & -1 & 4 & -1 \\ 0 & 0 & -1 & 3 \end{pmatrix}, \quad \mathbf{f} = \begin{pmatrix} 0 \\ 0 \\ 10 \\ 5 \end{pmatrix}\]

  1. Verify that \(K\) is strictly diagonally dominant, which guarantees that Gauss-Seidel converges for any load vector \(\mathbf{f}\).

  2. Identify the structural feature of \(K\) (bandwidth, sparsity pattern) that explains why iterative methods are preferred over LU decomposition for large truss problems of this type.

  3. Apply 2 iterations of Gauss-Seidel from \(\mathbf{u}^{(0)} = (0,0,0,0)^T\).

  • op: “Identify structural feature: K is tridiagonal (bandwidth 1)” eq: “\text{Only entries }k_{i,i},\;k_{i,i\pm 1}\text{ are non-zero — all others are zero}” note: “For large n, LU decomposition of a tridiagonal matrix remains tridiagonal (no fill-in), but for general sparse FEM matrices, LU can produce dense factors. Iterative methods work directly on the sparse structure without fill-in.”

  • op: “Write Gauss-Seidel formulas for each DOF” eq: “u_1{(k+1)}=\frac{1}{4}(0+u_2{(k)}),\quad u_2{(k+1)}=\frac{1}{4}(0+u_1{(k+1)}+u_3^{(k)}),\quad u_3{(k+1)}=\frac{1}{4}(10+u_2{(k+1)}+u_4^{(k)}),\quad u_4{(k+1)}=\frac{1}{3}(5+u_3{(k+1)})” note: ~

  • op: “Iteration 1 with u⁽⁰⁾=(0,0,0,0)^T” eq: “u_1^{(1)}=0/4=0,\quad u_2^{(1)}=(0+0)/4=0,\quad u_3^{(1)}=(10+0+0)/4=2.5,\quad u_4^{(1)}=(5+2.5)/3=2.5” note: ~

  • op: “Iteration 2 with u⁽¹⁾=(0,0,2.5,2.5)^T” eq: “u_1^{(2)}=0/4=0,\quad u_2^{(2)}=(0+0+2.5)/4=0.625,\quad u_3^{(2)}=(10+0.625+2.5)/4=3.281,\quad u_4^{(2)}=(5+3.281)/3=2.760” note: ~

  • op: “Physical interpretation of the iterates” eq: “\text{Node 4 displacement grows from 0 to 2.5 to 2.760 (in whatever units the stiffness and force are expressed) — iteration revealing the load path through the truss}” note: “The iterative process is physically meaningful: each sweep propagates the applied loads one structural connection further through the mesh. No units are assigned in this dimensionless example; in a real FEM model the stiffness entries carry units of kN/m and forces carry kN, giving displacements in metres.”

  • op: “Check convergence direction” eq: “\text{Residual }\|K\mathbf{u}^{(2)}-\mathbf{f}\|\infty:\quad K\mathbf{u}{(2)}\approx(-0.625,\;-0.781,\;9.739,\;4.999)T,\quad\text{vs }\mathbf{f}=(0,0,10,5)^T” note: ”The residual is shrinking; continue until ||Ku − f||∞ < tolerance (e.g. 10⁻⁶).” –>