Linear Systems and Gaussian Elimination#

The Elimination Method#

It is common to first encounter matrices as a tool for solving linear systems of equations by hand. At a basic level, matrices serve as a shorthand notation that simplify the process of adding or subtracting scalar multiples of equations to eliminate variables, a process called the elimination method of solving linear systems. The following example illustrates that approach and how we can use matrices to track the relevant data.

Example: Solve the linear system below:

\[\begin{split} \begin{alignat} 2x + y &= 4 \\ 4x -2y &= 8. \end{alignat} \end{split}\]

The elimination method is defined by three ‘legal’ operations that can be performed with the equations in a linear system, with the goal of producing a simpler equation than any currently in the system that nevertheless shares the solution. The three operations are given below.

Elimination Operations

The following three operations on a linear system of equations dd not change the solution to the system.

  • Changing the order in which the equations are written down,

  • Multiplying any equation in the system by a nonzero scalar,

  • Adding or subtracting any two equations in the system.

Combining the last two operations in particular is key, because we can use them to produce simpler equations; specifically equations with fewer unknowns than any equation currently in the system. Here, for instance, we can take the first equation in the system and multiply it by 2:

\[\begin{split} \begin{align} 2\cdot(2x + y) &= 2\cdot 4 \\ 4x -2y &= 8, \end{align} \end{split}\]
\[\begin{split} \begin{align} 4x +2y &= 8 \\ 4x -2y &= 8. \end{align} \end{split}\]

Then we can add the result to the second equation to produce:

\[ 8x = 16, \]

so \(x = 2\). Then if we go back to the first equation and substitute \(x=2\) into it, we see \(2\cdot2 + y = 4\) or \(4 + y = 4\), which implies that \(y=0\).

Gaussian Elimination#

All of the arithmetic in the steps above is performed on the coefficients and constants in the system. The variables serve only as placeholders, and as such we can organize the data that defines the system into an array that we can manipulate more efficiently and effectively, particularly when the system is larger than this toy example:

\[\begin{split} \begin{bmatrix} 2 & \hfill1 & | & 4 \\ 4 & -2 & | & 8 \end{bmatrix} \end{split}\]

This is called the augmented matrix of the linear system. Each row represents an equation, and each column contains the coefficients of one of the variables, aside from the last column, which contains the constants in the sytem. The vertical bar is not necessary, but is helpful to visually separate the column of constants from the columns of coefficients.

Now let’s translate the ‘legal’ steps of the elimination method to operations on the augmented matrix of the system. These operations are called row operations:

Row Operations on an Augmented Matrix

The following operations on the rows of an augmented matrix do not change the solution of the underlying linear system.

  • Any two rows of the matrix can be interchanged,

  • Any row can be multiplied by a nonzero constant, and

  • Any two rows can be added or subtracted.

Note that these are just the elimination operations expressed in terms of the augmented matrix that represents the underlying linear system.

Example: Let’s solve the linear system

\[\begin{split} \begin{align} 2x +y &= 6 \\ 4x -2y &= 8 \end{align} \end{split}\]

using row operations on the augmented matrix of the system. First, form the matrix as above:

\[\begin{split} \begin{bmatrix} 2 & \hfill1 & | & 6 \\ 4 & -2 & | & 8 \end{bmatrix}. \end{split}\]

Now, let’s combine two row operations into one: we will multiply the first row by 2 and then add the result to row 2. We will put the row that results from these two operations into the augmented matrix in place of row 2, since it is just a ‘legally’ altered version of row 2:

\[\begin{split} \begin{bmatrix} 2 & \hfill1 & | & 6 \\ 4 & -2 & | & 8 \end{bmatrix} \overset{R_2 + 2R_1\to R_2}{\sim} \begin{bmatrix} 2 & \hfill1 & | & 6 \\ 8 & 0 & | & 20 \end{bmatrix}. \end{split}\]

Now consider the second row of the augmented matrix: it represents the equation \(8x + 0y = 20\) or just \(8x=20\), so we can see that \(x=20/8=5/2\). Suppose we now multiply the second equation by \(1/4\) and subtract the result from row 1:

\[\begin{split} \begin{bmatrix} 2 & \hfill1 & | & 6 \\ 4 & -2 & | & 8 \end{bmatrix} \overset{R_2 + 2R_1\to R_2}{\sim} \begin{bmatrix} 2 & \hfill1 & | & 6 \\ 8 & 0 & | & 20 \end{bmatrix} \overset{R_1 - 1/4R_2 \to R_1}{\sim} \begin{bmatrix} 0 & \hfill1 & | & 1 \\ 8 & 0 & | & 20 \end{bmatrix}, \end{split}\]

and now the first row represents the equation \(0x + y = 1\), or simply \(y=1\). Let’s add two more row operations: a row swap, and a scalar multiply:

\[\begin{split} \begin{bmatrix} 2 & \hfill1 & | & 6 \\ 4 & -2 & | & 8 \end{bmatrix} \overset{R_2 + 2R_1\to R_2}{\sim} \begin{bmatrix} 2 & \hfill1 & | & 6 \\ 8 & 0 & | & 20 \end{bmatrix} \overset{R_1 - 1/4R_2 \to R_1}{\sim} \begin{bmatrix} 0 & \hfill1 & | & 1 \\ 8 & 0 & | & 20 \end{bmatrix} \overset{R_1 \leftrightarrow R_2}{\sim} \begin{bmatrix} 8 & 0 & | & 20 \\ 0 & 1 & | & 1 \end{bmatrix} \overset{1/8 R_1 \to R_1}{\sim} \begin{bmatrix} 1 & 0 & | & 5/2 \\ 0 & 1 & | & 1 \end{bmatrix}. \end{split}\]

With those last two row operations, we’ve produced an augmented matrix corresponding to the much simpler linear system

\[\begin{split} \begin{align} x &= 5/2 \\ y &= 1 \end{align} \end{split}\]

and this is the solution to our original system.

Notation

We use the \(\sim\) symbol to indicate row operations because we are changing the augmented matrix, and the augmented matrix before the row operation is not equal to the augmented matrix after the row operation. It does, however, contain a linear system with the same solution. The symbols above the \(\sim\) indicate precisely what row operations are taking place; for example, \(R_1 + 2R_2 \to R_1\) can be read ‘multiply row 2 by 2, add the result to row 1, and store the result in row 1.

Using an augmented matrix to solve a system like the one in the previous example is overkill; the value of the augmented matrix doesn’t appear until we start working with systems of at least three equations in three or more unknowns.

Definition

The main diagonal of an \(m\times n\) matrix \(A\) consists of the entries \(a_{11}, a_{22},\dots,a_{mm}\).

Row Operation Rules of Thumb

  • Make all entries below the main diagonal zero before making entries above the main diagonal zero.

  • When working below the main diagonal, work left to right; that is, make all entries (below the main diagonal) in column \(j\) 0 before moving on to column \(j+1\).

  • When working below the main diagonal in column \(j\), work down; that is make entry \(a_{ij}\) 0 before moving on to entry \(a_{i+1j}\).

  • When working above the main diagonal, work right to left; that is, make all entries in column \(j\) 0 before moving on to column \(j-1\).

  • When working above the main diagonal in column \(j\), work up; that is make entry \(a_{ij}\) 0 before moving on to entry \(a_{i-1j}\).

  • It’s nice to have ones on the main diagonal (this will make row operations easier); swap rows or use scalar multiplication when convenient to produce these.

Taken together, these heuristics suggest working in a counterclockwise around the augmented matrix performing row operations to produce zeros off of the main diagonal. This is called Gaussian Elimination and is illustrated in the following example.

Example: Use Gaussian elimination to solve the linear system

\[\begin{split} \begin{align} 2x + 2y + z &= 3 \\ x + y + z &= 1 \\ 3x -y -z &= 3 \end{align} \end{split}\]

First, form the augmented matrix:

\[\begin{split} \begin{bmatrix} 2 & \hfill2 & \hfill1 & | & 3 \\ 1 & \hfill1 & \hfill1 & | & 1 \\ 3 & -1 & -1 & | & 3 \end{bmatrix} \end{split}\]

Then, swap rows one and two to get a \(1\) in position \(1, 1\):

\[\begin{split} \begin{bmatrix} 2 & \hfill2 & \hfill1 & | & 3 \\ 1 & \hfill1 & \hfill1 & | & 1 \\ 3 & -1 & -1 & | & 3 \end{bmatrix} \overset{R_1\leftrightarrow R_2}{\sim} \begin{bmatrix} 1 & \hfill1 & \hfill1 & | & 1 \\ 2 & \hfill2 & \hfill1 & | & 3 \\ 3 & -1 & -1 & | & 3 \end{bmatrix}. \end{split}\]

We will work down the first column constructing row operations to produce zeros:

\[\begin{split} \begin{bmatrix} 1 & \hfill1 & \hfill1 & | & 1 \\ 2 & \hfill2 & \hfill1 & | & 3 \\ 3 & -1 & -1 & | & 3 \end{bmatrix} \overset{R_2 -2R_1\to R_2}{\sim} \begin{bmatrix} 1 & \hfill1 & \hfill1 & | & 1 \\ 0 & \hfill0 & -1 & | & 1 \\ 3 & -1 & -1 & | & 3 \end{bmatrix} \overset{R_3 - 3R_1\to R_3}{\sim} \begin{bmatrix} 1 & \hfill1 & \hfill1 & | & 1 \\ 0 & \hfill0 & -1 & | & 1 \\ 0 & -4 & -4 & | & 0 \end{bmatrix}. \end{split}\]

At this point, we want to move to the second column and produce zeros below the main diagonal - that’s really just in position \(3, 2\). We only have two possible ways we could do this: a row operation that combines row 1 with row 3 and stores the result in row 3, or swapping rows 2 and 3. Aside from the fact that the second option is much simpler than the first, note also that if I construct any row operation that combines row 1 with row 3 the first entry of the result will necessarily be nonzero, because the first entry of row 1 is nonzero. This means that if I did this I would be destroying work that I already did. We don’t want to do that! Let’s swap the rows:

\[\begin{split} \begin{bmatrix} 1 & \hfill1 & \hfill1 & | & 1 \\ 0 & \hfill0 & -1 & | & 1 \\ 0 & -4 & -4 & | & 0 \end{bmatrix} \overset{R_2\leftrightarrow R_1}{\sim} \begin{bmatrix} 1 & \hfill1 & \hfill1 & | & 1 \\ 0 & -4 & -4 & | & 0 \\ 0 & \hfill0 & -1 & | & 1 \end{bmatrix}, \end{split}\]

and then multiply row 3 by \(-1\), because having 1’s on the main diagonal is preferable.

\[\begin{split} \begin{bmatrix} 1 & \hfill1 & \hfill1 & | & 1 \\ 0 & -4 & -4 & | & 0 \\ 0 & \hfill0 & -1 & | & 1 \end{bmatrix} \overset{-R_3\to R_3}{\sim} \begin{bmatrix} 1 & \hfill1 & \hfill1 & | & 1 \\ 0 & -4 & -4 & | & 0 \\ 0 & \hfill0 & 1 & | & -1 \end{bmatrix}. \end{split}\]

Now we work above the main diagonal, bottom to top, right to left:

\[\begin{split} \begin{bmatrix} 1 & \hfill1 & \hfill1 & | & 1 \\ 0 & -4 & -4 & | & 0 \\ 0 & \hfill0 & 1 & | & -1 \end{bmatrix} \overset{R_2 + 4R_3\to R_2}{\sim} \begin{bmatrix} 1 & \hfill1 & \hfill1 & | & 1 \\ 0 & -4 & 0 & | & -4 \\ 0 & \hfill0 & 1 & | & -1 \end{bmatrix} \overset{R_1 - R_3\to R_1}{\sim} \begin{bmatrix} 1 & \hfill1 & 0 & | & 2 \\ 0 & -4 & 0 & | & -4 \\ 0 & \hfill0 & 1 & | & -1 \end{bmatrix} \overset{-1/4R_2 \to R_2}{\sim} \begin{bmatrix} 1 & 1 & 0 & | & \hfill2 \\ 0 & 1 & 0 & | & \hfill1 \\ 0 & 0 & 1 & | & -1 \end{bmatrix} \overset{R_1 - R_2 \to R_1}{\sim} \begin{bmatrix} 1 & 0 & 0 & | & \hfill1 \\ 0 & 1 & 0 & | & \hfill1 \\ 0 & 0 & 1 & | & -1 \end{bmatrix}. \end{split}\]

At this point, the rows of the matrix give three simple equations that provide the solution to our system:

\[\begin{split} \begin{align} x &= \hfill1 \\ y &= \hfill1 \\ z &= -1 \end{align} \end{split}\]

Computational Cost of Gaussian Elimination#

Gaussian Elimination is expensive. Assuming the coefficient matrix is \(n\times n\); following the algorithm as outlined above will result in \(n^3/3 + n^2 - n/3\) multiplications/divisions and \(n^3/3 + n^2/2 - 5n/6\) additions/subtractions (we will justify these numbers in a subsequent chapter). For large \(n\) that’s approximately \(n^3/3\) or \(\mathcal{O}(n^3)\) operations. Working by hand on a \(3\times3\) example probably already feels a bit tedious; now consider how this scales:

\(n\)

# Multiplications/Divisions

# Additions/Subtractions

Total Operations

3

17

11

28

10

430

375

805

100

343,300

338,250

681,550

1,000

334,333,000

333,832,500

668,165,500

What this means is that we need to take steps to obtain efficiencies wherever possible. The remainder of this section will be devoted to that task.

Row Reduction as Matrix Multiplication#

The reason that Gaussian elimination works to solve linear systems is that it is really a convenient way to express matrix multiplications. As we saw in the last section, a linear system can typically be expressed as a matrix equation \(A\mathbf{x} = \mathbf{b}\); for example, the first system above

\[\begin{split} \begin{align} 2x + \hfill y &= 4 \\ 4x -2y &= 8. \end{align} \end{split}\]

can be written as a matrix equation

\[\begin{split} \begin{bmatrix} 2 & \hfill1 \\ 4 & -2 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} 4 \\ 8 \end{bmatrix}, \end{split}\]

where the coefficient matrix is the square matrix on the left, we are solving for a vector of unknowns \(\mathbf{x}\) whose components are \(x\) and \(y\), and the constants from the right-hand side of the system are the result of multiplying \(A\) with \(\mathbf{x}\) according to the way we defined matrix-vector multiplication previously.

Now as a general rule, if I have an equation \(A\mathbf{x} = \mathbf{b}\) and I multiply both sides of this equation by some matrix \(E_{21}\), assuming that \(E_{21}\) has the right dimensions for the multiplication to be performed, then equality must still hold: \(E_{21}A\mathbf{x} = E_{21}\mathbf{b}\). Let’s look at this with a specific \(E_{21}\):

\[\begin{split} \begin{bmatrix} \hfill1 & 0 \\ -2 & 1 \end{bmatrix} \begin{bmatrix} 2 & \hfill1 \\ 4 & -2 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} \hfill1 & 0 \\ -2 & 1 \end{bmatrix} \begin{bmatrix} 4 \\ 8 \end{bmatrix} \end{split}\]
\[\begin{split} \begin{bmatrix} 2 & \hfill1 \\ 0 & -4 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} 4 \\ 0 \end{bmatrix}. \end{split}\]

Multiplying by this particular \(E_{21}\) did the same thing as the specific row operation we would take to eliminate the entry in position \(2,1\) of the augmented matrix of the system:

\[\begin{split} \begin{bmatrix} 2 & \hfill1 & | & 4\\ 4 & -2 & | & 8 \end{bmatrix} \overset{R_2 - 2R_1 \to R_2}{\sim} \begin{bmatrix} 2 & \hfill1 & | & 4\\ 0 & -4 & | & 0 \end{bmatrix}. \end{split}\]

The matrix \(E_{21}\) above is called an elimination matrix.

Definition

An elimination matrix is a square matrix with ones on the main diagonal and a single nonzero entry off of the main diagonal. We use the notation \(E_{ij}\) to indicate an elimination matrix with a nonzero entry in position \(i, j\).

Elimination matrices are so called because as was demonstrated above, with the right choice of nonzero value they can be used to eliminate entries from other matrices. Row swaps come from permutation matrices.

Definition

A permutation matrix is a square matrix in which each row and column contains only one nonzero entry, and that entry is 1. We use the notation \(P_{ij}\) to denote a permutation matrix such that \(P_{ij}A\) swaps rows \(i\) and \(j\) of \(A\).

For example,

\[\begin{split} P_{13} = \begin{bmatrix} 0 & 0 & 1 \\ 0 & 1 & 0 \\ 1 & 0 & 0 \end{bmatrix} \end{split}\]

can be used to swap rows 1 and 3 of a matrix; for example,

\[\begin{split} \begin{bmatrix} 0 & 0 & 1 \\ 0 & 1 & 0 \\ 1 & 0 & 0 \end{bmatrix} \begin{bmatrix} 3 & -1 & 5 & \hfill4 \\ 0 & \hfill2 & -2 & 1 \\ 1 & \hfill4 & \hfill7 & 0 \end{bmatrix} = \begin{bmatrix} 1 & \hfill4 & \hfill7 & 0 \\ 0 & \hfill2 & -2 & 1 \\ 3 & -1 & \hfill5 & 4 \end{bmatrix}. \end{split}\]

An alternative way to define a permutation matrix is as a matrix whose rows can be rearranged to produce the identity matrix.

We can reverse the row operation performed by an elimination matrix by changing the sign of the off-diagonal entry; for instance, take the row-reduced coefficient matrix from the linear system above:

\[\begin{split} \begin{bmatrix} 1 & 0 \\ 2 & 1 \end{bmatrix} \begin{bmatrix} 2 & \hfill1 \\ 0 & -4 \end{bmatrix} = \begin{bmatrix} 2 & \hfill1 \\ 4 & -2 \end{bmatrix}. \end{split}\]

The matrix on the left is almost the same \(E_{21}\) from above, but with the sign of the entry in position \(2, 1\) changed from positive to negative. Because this matrix performs the inverse of the operation that \(E_{21}\) performs, we call it the inverse of \(E_{21}\) and denote it \(E_{21}^{-1}\).

Inverse matrices in general are important theoretical concepts but are computationally expensive to construct, so they aren’t often used in practice. However, inverses for elimination matrices are an exception: they can be quickly constructed as just demonstrated. Note that \(E_{21}^{-1}E_{21} = I\) and \(E_{21}E_{21}^{-1} = I\); this is the key property of inverse matrices: if a matrix \(A\) has an inverse \(A^{-1}\), then \(AA^{-1} = A^{-1}A = I\).

Moreover, inverses of elimination matrices serve an important use: they lead to a factorization of the coefficient matrix of a linear system called the \(LU\)-factorization that is widely used. Again, let’s refer to the example above, where we have

\[\begin{split} \begin{bmatrix} 1 & 0 \\ 2 & 1 \end{bmatrix} \begin{bmatrix} 2 & \hfill1 \\ 0 & -4 \end{bmatrix} = \begin{bmatrix} 2 & \hfill1 \\ 4 & -2 \end{bmatrix}. \end{split}\]

In the product on the left side of the equals sign, the matrix on the left is a lower-triangular matrix (\(L\)), while the matrix on the right is an upper-triangular matrix (\(U\)). Multiplying them together produces the coefficient matrix of the linear system: \(A=LU\).

Important

Although above we are using the coefficient matrix of the linear system in the example above for convenience, we could have taken any matrix, multiplied by an elimination matrix, inverted the elimination matrix, and obtained this factorization.

The \(A=LU\) Factorization#

Definition

Let \(A\) be a square matrix. An \(LU\)-factorization of \(A\) is an expression of \(A\) as a product of two square matrices \(L\) and \(U\) such that \(A=LU\), \(L\) is lower triangular, and \(U\) is upper triangular.

Not all square matrices have an \(LU\)-factorization, and when the factorization exist it is not unique.

Example: Suppose that

\[\begin{split} A = \begin{bmatrix} \hfill1 & 2 & 1 \\ \hfill2 & 2 & 0 \\ -1 & 1 & 1 \end{bmatrix}. \end{split}\]

We will use elimination matrices to construct an \(LU\)-factorization of \(A\). First, to eliminate the entry in position \(2, 1\), we would multiply \(A\) by an elimination matrix \(E_{21}\):

\[\begin{split} \begin{bmatrix} \hfill 1 & 0 & 0 \\ -2 & 1 & 0 \\ \hfill 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} \hfill1 & 2 & 1 \\ \hfill2 & 2 & 0 \\ -1 & 1 & 1 \end{bmatrix} = \begin{bmatrix} \hfill1 & \hfill2 & \hfill1 \\ \hfill0 & -2 & -2 \\ -1 & \hfill1 & \hfill1 \end{bmatrix}. \end{split}\]

Now eliminate the entry in position \(3, 1\) by multiplying with an appropriate \(E_{31}\):

\[\begin{split} \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 1 & 0 & 1 \end{bmatrix} \begin{bmatrix} \hfill 1 & 0 & 0 \\ -2 & 1 & 0 \\ \hfill 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} \hfill1 & 2 & 1 \\ \hfill2 & 2 & 0 \\ -1 & 1 & 1 \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 1 & 0 & 1 \end{bmatrix} \begin{bmatrix} \hfill1 & \hfill2 & \hfill1 \\ \hfill0 & -2 & -2 \\ -1 & \hfill1 & \hfill1 \end{bmatrix} = \begin{bmatrix} \hfill1 & \hfill2 & \hfill1 \\ \hfill0 & -2 & -2 \\ 0 & \hfill3 & \hfill2 \end{bmatrix}. \end{split}\]

And finally, the entry in position \(3, 2\):

\[\begin{split} \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 3/2 & 1 \end{bmatrix} \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 1 & 0 & 1 \end{bmatrix} \begin{bmatrix} \hfill 1 & 0 & 0 \\ -2 & 1 & 0 \\ \hfill 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} \hfill1 & 2 & 1 \\ \hfill2 & 2 & 0 \\ -1 & 1 & 1 \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 3/2 & 1 \end{bmatrix} \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 1 & 0 & 1 \end{bmatrix} \begin{bmatrix} \hfill1 & \hfill2 & \hfill1 \\ \hfill0 & -2 & -2 \\ -1 & \hfill1 & \hfill1 \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 3/2 & 1 \end{bmatrix} \begin{bmatrix} \hfill1 & \hfill2 & \hfill1 \\ \hfill0 & -2 & -2 \\ 0 & \hfill3 & \hfill2 \end{bmatrix} = \begin{bmatrix} \hfill1 & \hfill2 & \hfill1 \\ \hfill0 & -2 & -2 \\ 0 & \hfill0 & -1 \end{bmatrix}. \end{split}\]

The line above spells out the following computation: \(E_{32}E_{31}E_{21}A = U\) (note that the matrix on the right is upper triangular). If we now invert each of the three row operations in reverse order, we will have our factorization: \(A=E_{21}^{-1}E_{31}^{-1}E_{32}^{-1}U = LU\). The inversions are straightforward:

\[\begin{split} E_{32}^{-1} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & -3/2 & 1 \end{bmatrix}, E_{31}^{-1} = \begin{bmatrix} \hfill1 & 0 & 0 \\ \hfill0 & 1 & 0 \\ -1 & 0 & 1 \end{bmatrix}, E_{21}^{-1} = \begin{bmatrix} \hfill 1 & 0 & 0 \\ 2 & 1 & 0 \\ \hfill 0 & 0 & 1 \end{bmatrix}, \end{split}\]

and the multiplication of elimination matrices is very quick:

\[\begin{split} E_{21}^{-1}E_{31}^{-1}E_{32}^{-1} = \begin{bmatrix} 1 & 0 & 0 \\ 2 & 1 & 0 \\ -1 & -3/2 & 1 \end{bmatrix}, \end{split}\]

which is clearly lower triangular. Finally, we can check that \(A = LU\) for this \(L\) and \(U\):

\[\begin{split} LU = \begin{bmatrix} 1 & 0 & 0 \\ 2 & 1 & 0 \\ -1 & -3/2 & 1 \end{bmatrix} \begin{bmatrix} \hfill1 & \hfill2 & \hfill1 \\ \hfill0 & -2 & -2 \\ 0 & \hfill0 & -1 \end{bmatrix} = \begin{bmatrix} \hfill1 & 2 & 1 \\ \hfill2 & 2 & 0 \\ -1 & 1 & 1 \end{bmatrix} = A. \end{split}\]

The Computational Efficiency of \(A=LU\)#

Note: This section uses big-O notation, which hasn’t been introduced yet. If you aren’t familiar with big-O notation, skip ahead to the section on computation and come back here after you’ve read it.

Calculating an \(LU\) factorization is not generally faster than solving a system using elimination; as the \(LU\) factorization is in effect just the elimination method expressed through matrices the computational cost is the same \(\mathcal{O}(n^3)\) (though there are some practical implementation tricks that are used). The efficiency we gain from it shows up in a specific situation that comes up often in applications: when we have to solve \(A\mathbf{x} = \mathbf{b}\) for one matrix \(A\) but a large number of distinct vectors \(\mathbf{b}\). This is a quite common real-world scenario. Then, if we have a factorization \(A=LU\), we can use the following steps to solve the systems:

  1. Set \(U\mathbf{x} = \mathbf{y}\). This is a placeholder operation, so no computational cost.

  2. Solve \(L\mathbf{y} = \mathbf{b}\) for \(\mathbf{y}\). The cost is \(\mathcal{O}(n^2)\), because \(L\) is triangular.

  3. Solve \(U\mathbf{x} = \mathbf{y}\) for \(\mathbf{x}\). The cost is \(\mathcal{O}(n^2)\), because \(U\) is triangular.

The computational cost of all of three steps taken together is \(\mathcal{O}(n^2)\). For small \(n\), \(\mathcal{O}(n^3)\) and \(\mathcal{O}(n^2)\) are often similar in magnitude, but as \(n\) grows the difference becomes dramatic, so it can be vastly more efficient to first obtain the \(A=LU\) factorization at the \(\mathcal{O}(n^3)\) computational cost and then use it to solve \(A\mathbf{x} = \mathbf{b}\) over and over using the steps above at the dramatically reduced \(\mathcal{O}(n^2)\) cost.

A real-world example where this occurs is in structural engineering analysis of a building under different loading conditions. Consider analyzing a large building framework where:

  • the matrix \(A\) represents the structural stiffness: each component is determined basaed on the geometry, material properties, and connections of the building,

  • \(\mathbf{x}\) represents the displacements/deflections at various nodes, and

  • \(\mathbf{b}\) represents the external forces applied to the structure.

The engineer needs to solve the same structural system - so, the same \(A\) - under many different loading scenarios:

  • Dead loads (permanent weight of structure)

  • Live loads (occupancy, furniture, equipment)

  • Wind loads from different directions

  • Snow loads

  • Seismic loads

  • Various combinations required by building codes

Each loading case gives a different \(\mathbf{b}\), but the structural stiffness matrix A remains the same since the building geometry and materials haven’t changed.

Another real-world example that we will look at later comes up in marketing, where an analyst wants to predict a variety of different consumer behaviours which are captured in distinct vectors \(\mathbf{b}\) based on a fixed set of data about consumers which is captured in a single fixed matrix \(A\). Again obtaining \(A=LU\) is expensive, but once obtained can be used over and over for far less computational cost.

An \(A=LU\) Shortcut#

There is a shortcut one can use to obtain the factorization \(A=LU\) without computing each of the individual elimination matrices, doing the eliminations to obtain \(U\), and then inverting them and computing their product to finally obtain \(L\). Instead, perform the eliminations as if we were doing row operations, and keep track of the below-diagonal pivot column data: this (with a scalar correction) will form \(L\).

Example:

Let

\[\begin{split} A = \begin{bmatrix} 1 & 2 & -1 \\ 2 & 1 & 0 \\ -1 & 1 & 2 \end{bmatrix}. \end{split}\]

Now let’s step through a couple of row operations. The first two are done using \(a_{11}\) as the pivot and \(A_1\) is the pivot column:

\[\begin{split} \begin{bmatrix} 1 & 2 & -1 \\ 2 & 1 & 0 \\ -1 & 1 & 2 \end{bmatrix} \sim \begin{bmatrix} 1 & 2 & -1 \\ 0 & -3 & 2 \\ 0 & 3 & 1 \end{bmatrix}. \end{split}\]

The first pivot column \(A_1\) is the first column of \(L\):

\[\begin{split} L = \begin{bmatrix} 1 & & \\ 2 & & \\ -1 & & \end{bmatrix}. \end{split}\]

Now, the next row operation on \(A\) is to zero out \(a_{32}\):

\[\begin{split} \begin{bmatrix} 1 & 2 & -1 \\ 0 & -3 & 2 \\ 0 & 3 & 1 \end{bmatrix}\sim \begin{bmatrix} 1 & 2 & -1 \\ 0 & -3 & 2 \\ 0 & 0 & 3 \end{bmatrix} = U, \end{split}\]

and the result is an upper triangular matrix \(U\) that is the desired \(U\) in our factorization \(A=LU\). The pivot column for the last row operation was \(A_2\), and the pivot was \(a_{22}\). We want to take \(A_2\) data from the pivot down and use it in \(L\), but we need to normalize it so that the pivot is 1; that is, we take column 2 of the partially reduced \(A\), keep the pivot entry and entries below the pivot (-3 and 3) and divide them by the pivot value (-3) to produce the next column in \(L\):

\[\begin{split} L = \begin{bmatrix} 1 & 0 & \\ 2 & 1 & \\ -1 & -1 & \end{bmatrix}. \end{split}\]

For the remaining column in \(L\), the first two entries must be 0, and the third must be 1:

\[\begin{split} L = \begin{bmatrix} 1 & 0 & 0 \\ 2 & 1 & 0 \\ -1 & -1 & 1 \end{bmatrix}. \end{split}\]

You can verify now that

\[\begin{split} A = \begin{bmatrix} 1 & 2 & -1 \\ 2 & 1 & 0 \\ -1 & 1 & 2 \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0 \\ 2 & 1 & 0 \\ -1 & -1 & 1 \end{bmatrix} \begin{bmatrix} 1 & 2 & -1 \\ 0 & -3 & 2 \\ 0 & 0 & 3 \end{bmatrix} = LU \end{split}\]

as desired.

\(A=PLU\)#

What if we want to factor \(A=LU\) but the pivots of \(A\) are in the wrong place(s)? For example, suppose we want to factor

\[\begin{split} A = \begin{bmatrix} 0 & 1 & 1 \\ 1 & 2 & 2 \\ 2 & 2 & 3 \end{bmatrix}. \end{split}\]

To get a pivot in \(a_{11}\), I need to swap rows. We can use a permutation matrix to do this:

\[\begin{split} \begin{bmatrix} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} 0 & 1 & 1 \\ 1 & 2 & 2 \\ 2 & 2 & 3 \end{bmatrix} = \begin{bmatrix} 1 & 2 & 2 \\ 0 & 1 & 1 \\ 2 & 2 & 3 \end{bmatrix}, \end{split}\]

and we can then proceed to factor the matrix \(PA\) into \(LU\). Now, we typically want a strict factorization of \(A\), so if \(PA=LU\), we can multiply both sides of the equation by \(P\) again to produce \(PPA=PLU\). But applying \(P\) twice is equivalent to doing nothing at all; that is, \(P\) is its own inverse: \(PP = I\). So \(PPA = A = PLU\), and from a computational standpoint adding \(P\) contributes very little additional overhead.