MultiPhysicsVault/.raw/FiniteElementProcedures/FiniteElementProcedures_077.md

<!-- source-page: 761 -->

8.9. Consider the equations

$$
\left[ \begin{array}{r r r} 2 & - 1 & 0 \\ - 1 & 2 & - 1 \\ 0 & - 1 & 2 \end{array} \right] \left[ \begin{array}{l} U _ {1} \\ U _ {2} \\ U _ {3} \end{array} \right] = \left[ \begin{array}{l} 1 \\ 0 \\ 0 \end{array} \right]
$$

Establish the $LDL^{T}$ factorization of the coefficient matrix and then solve the equations as given in (8.19) and (8.20). Finally, also calculate the Cholesky factor $\tilde{L}$ of the coefficient matrix.

8.10. Consider the equations

$$
\left[ \begin{array}{r r r} 2 & - 1 & 0 \\ - 1 & 2 & - 1 \\ 0 & - 1 & 2 + k \end{array} \right] \left[ \begin{array}{l} U _ {1} \\ U _ {2} \\ U _ {3} \end{array} \right] = \left[ \begin{array}{l} 1 \\ 0 \\ 0 \end{array} \right]
$$

Establish for which value(s) of $k$ (a) the coefficient matrix is indefinite and (b) the equations cannot be solved.

8.11. Use the basic steps of Gauss elimination given in Section 8.2.1 to solve the following nonsymmetric set of equations. Then show that these solution steps can be written in the form (8.10) to (8.20), except that we need to replace $L^{T}$ in these equations by an upper unit triangular matrix $L_{u}$ . Establish $L_{u}$ .

$$
\left[ \begin{array}{r r r} 3 & - 1 & 0 \\ - 2 & 4 & - 1 \\ 0 & - 2 & 3 \end{array} \right] \left[ \begin{array}{l} U _ {1} \\ U _ {2} \\ U _ {3} \end{array} \right] = \left[ \begin{array}{l} 0 \\ 1 \\ 0 \end{array} \right]
$$

8.12. Carry out Exercise 8.11 but use the following set of equations:

$$
\left[ \begin{array}{c c c} 4 & - 1 & 0 \\ 2 & 6 & - 2 \\ 0 & - 1 & 4 \end{array} \right] \left[ \begin{array}{c} U _ {1} \\ U _ {2} \\ U _ {3} \end{array} \right] = \left[ \begin{array}{c} 1 \\ 0 \\ 0 \end{array} \right]
$$

8.13. Establish the $\mathbf{LDL}^T$ factorization of the following set of equations:

$$
\left[ \begin{array}{r r r r} 2 & - 1 & 0 & - 1 \\ - 1 & 2 & - 1 & 0 \\ 0 & - 1 & 2 & 1 \\ - 1 & 0 & 1 & 0 \end{array} \right] \left[ \begin{array}{l} U _ {1} \\ U _ {2} \\ U _ {3} \\ \lambda \end{array} \right] = \left[ \begin{array}{l} 0 \\ 1 \\ 0 \\ - 0 \end{array} \right]
$$

Here $U_{1}, U_{2}$ , and $U_{3}$ are displacements and $\lambda$ is a Lagrange multiplier (force) (see Section 3.4.1).

Also establish a simple finite element model whose response is governed by these equations.

8.14. Consider the following set of equations

$$
\left[ \begin{array}{c c} \mathbf {K} & \mathbf {K} _ {\lambda} ^ {T} \\ \mathbf {K} _ {\lambda} & \mathbf {0} \end{array} \right] \left[ \begin{array}{c} \mathbf {U} \\ \boldsymbol {\lambda} \end{array} \right] = \left[ \begin{array}{c} \mathbf {R} \\ \mathbf {R} _ {\lambda} \end{array} \right]
$$

where K is a symmetric positive definite matrix of order n (K corresponds to a finite element model properly supported to not contain rigid body modes), and the $K_{\lambda}$ matrix and $R_{\lambda}$ vector correspond to p constraint equations (as, for example, are encountered in contact analysis; see Section 6.7). The vector $\lambda$ contains the Lagrange multipliers.

Show that as long as the constraint equations are linearly independent and p < n, we can use the solution procedure in (8.10) to (8.20) to solve for the unknown displacements and Lagrange multipliers.

<!-- source-page: 762 -->

8.15. Consider the structural model in Example 8.9 and assume that a stiffness of $k_{i} = (EA_{1} / L) \times i$ is added to the degree of freedom $u_{i}, i = 1, 2, \ldots, 5$ . Hence, these stiffness values are only an addition to the diagonal elements of the original stiffness matrix given in Example 8.9. Use substructuring to solve for the stiffness matrix of the resulting structure defined by the degrees of freedom $U_{1}, U_{3}$ , and $U_{5}$ only.
8.16. Use substructuring to solve for the $3 \times 3$ stiffness matrix and corresponding force vector of the bar structure in Fig. E8.10 corresponding to the degrees of freedom $U_{1}$ , $U_{7}$ , and $U_{9}$ .
8.17. Consider the cantilever beam in Example 8.1 and its stiffness matrix (which is given in the result of the first experiment; see Exercise 8.5). Calculate the eigenvalues of the original problem and of the associated constraint problems and thus show that the Sturm sequence property holds in this case. (Hint: See Fig. 8.9.)
8.18. Consider the stiffness matrix in Exercise 8.4. Calculate the eigenvalues of the original problem and of the associated constraint problems and thus show that the Sturm sequence property holds in this case. (Hint: See Figure 8.9.)
8.19. Consider the following matrix:

$$
\left[ \begin{array}{c c c} 1 0 & - 1 0 & 0 \\ - 1 0 & 1 0 + k & - k \\ 0 & - k & 1 0 + k \end{array} \right]
$$

Evaluate the value of k such that the condition number of the matrix is about $10^{8}$ .

8.20. Calculate the exact condition number of the simply supported beam stiffness matrix in Fig. 8.1(a) and an approximation thereof using a norm. [Hint: See Fig. 8.9 and (8.63).]

# 8.3 ITERATIVE SOLUTION METHODS

In many analyses some form of direct solution based on Gauss elimination to solve the equilibrium equations $KU = R$ is very efficient (see Section 8.2). It is interesting to note, however, that during the initial developments of the finite element method, iterative solution algorithms have been employed (see R. W. Clough and E. L. Wilson [A]).

A basic disadvantage of an iterative solution is that the time of solution can be estimated only very approximately because the number of iterations required for convergence depends on the condition number of the matrix K and whether the acceleration schemes used are effective for the particular case considered. It is primarily for this reason that the use of iterative methods in finite element analysis was largely abandoned during the 1960s and 1970s, while the direct methods of solution have been refined and rendered extremely effective (see Section 8.2).

However, when considering very large finite element systems, a direct method of solution can require a large amount of storage and computer time. The basic reason is that the required storage is proportional to $nm_{K}$ , where n = number of equations, $m_{K} =$ half-bandwidth, and a measure of the number of operations is $\frac{1}{2}nm_{K}^{2}$ . Since the half-bandwidth is (roughly) proportional to $\sqrt{n}$ , we recognize that as n increases, the demands on storage and computation time can become very large. In practice, the available storage on a computer frequently limits the size of finite element system that can be solved.

On the other hand, in an iterative solution the required storage is much less because we need to store only the actually nonzero matrix elements under the skyline of the matrix, a pointer array that indicates the location of each nonzero element, and some arrays, also

<!-- source-page: 763 -->

of small size measured on the value of $nm_{K}$ , for example, for the preconditioner and iteration vectors. The nonzero matrix elements under the skyline are only a small fraction of all the elements under the skyline, as we demonstrate in the following example.

EXAMPLE 8.21: Consider the finite element model of three-dimensional elements shown in Fig. E8.21. Estimate the number of matrix elements under the skyline (to be stored in a direct solution) and the number of actually nonzero matrix elements (to be stored in an iterative solution).

![](images/page-763_63be4b311c90284ed079a6f9e71ed7f2632c4a79afa74f2001cac4381d86523a.jpg)

<details>
<summary>other</summary>

| Row | Column | Value |
|-----|--------|-------|
| 1   | 1      | 38    |
| 2   | 2      | 8     |
| 3   | 3      | 9     |
| 4   | 4      | 10    |
| 5   | 5      | 11    |
| 6   | 6      |       |
| 7   | 7      | 7     |
| 8   | 8      |       |
| 9   | 9      |       |
| 10  | 10     |       |
| 11  | 11     |       |
| 12  | 12     |       |
| 13  | 13     |       |
| 14  | 14     |       |
| 15  | 15     |       |
| 16  | 16     |       |
| 17  | 17     |       |
| 18  | 18     |       |
| 19  | 19     |       |
| 20  | 20     |       |
| 21  | 21     |       |
| 22  | 22     |       |
| 23  | 23     |       |
| 24  | 24     |       |
| 25  | 25     |       |
| 26  | 26     |       |
| 27  | 27     |       |
| 28  | 28     |       |
| 29  | 29     |       |
| 30  | 30     |       |
| 31  | 31     |       |
| 32  | 32     |       |
| 33  | 33     |       |
| 34  | 34     |       |
| 35  | 35     |       |
| 36  | 36     |       |
| 37  | 37     |       |
| 38  | 38     |       |
| 39  | 39     |       |
| 40  | 40     |       |
| 41  | 41     |       |
| 42  | 42     |       |
| 43  | 43     |       |
| 44  | 44     |       |
| 45  | 45     |       |
| 46  | 46     |       |
| 47  | 47     |       |
| 48  | 48     |       |
| 49  | 49     |       |
| 50  | 50     |       |
| 51  | 51     |       |
| 52  | 52     |       |
| 53  | 53     |       |
| 54  | 54     |       |
| 55  | 55     |       |
| 56  | 56     |       |
| 57  | 57     |       |
| 58  | 58     |       |
| 59  | 59     |       |
| 60  | 60     |       |
| 61  | 61     |       |
| 62  | 62     |       |
| 63  | 63     |       |
| 64  | 64     |       |
| 65  | 65     |       |
| 66  | 66     |       |
| 67  | 67     |       |
| 68  | 68     |       |
| 69  | 69     |       |
| 70  | 70     |       |
| 71  | 71     |       |
| 72  | 72     |       |
| 73  | 73     |       |
| 74  | 74     |       |
| 75  | 75     |       |
| 76  | 76     |       |
| 77  | 77     |       |
| 78  | 78     |       |
| 79  | 79     |       |
| 80  | 80     |       |
| 81  | 81     |       |
| 82  | 82     |       |
| 83  | 83     |       |
| 84  | 84     |       |
| 85  | 85     |       |
| 86  | 86     |       |
| 87  | 87     |       |
| 88  | 88     |       |
| 89  | 89     |       |
| 90  | 90     |       |
| 91  | 91     |       |
| 92  | 92     |       |
| 93  | 93     |       |
| 94  | 94     |       |
| 95  | 95     |       |
| 96  | 96     |       |
| 97  | 97     |       |
| 98  | 98     |       |
| 99  | 99     |       |
| 100 | 100    |       |
</details>

q = number of element layers in each direction
Figure E8.21 Assemblage of eight-node three-dimensional elements, 3 degrees of freedom per node

The half-bandwidth is given by the maximum difference in nodal point numbers in an element. Here we have for q element layers in each direction,

$$
m _ {K} = [ (q + 1) ^ {2} + q + 3 ] \times 3 - 1
$$

or when $q$ is large

$$
m _ {\mathbf {K}} \doteq 3 q ^ {2} \tag {a}
$$

For the problem considered, the column heights are practically constant, and hence the number of elements under the skyline is about

$$
n m _ {K} \doteq (3 q ^ {3}) (3 q ^ {2}) = 9 q ^ {5}
$$

On the other hand, the number of actually nonzero elements in the skyline is determined by the fact that each nodal point i actually couples to only the directly surrounding nodal points. For an interior nodal point and eight-node elements the coupling pertains to 27 nodal points, hence the "compressed" half-bandwidth is

$$
m _ {\mathrm{K}} \mid_ {\text { compressed }} = \frac {2 7}{2} \times 3 \sim 4 0 \tag {b}
$$

and we note that this result is independent of the number of elements and the number of nodal points used in the model. Namely, the result in (b) depends only on how many elements couple into a typical nodal point. Comparing (a) and (b) we observe that the number of nonzero elements under the skyline increases only linearly with n, and the percentage measured on all elements under the skyline is very small when q is large.

<!-- source-page: 764 -->

The fact that considerable storage can be saved in an iterative solution has prompted a large amount of research effort to develop increasingly effective iterative schemes. The key to effectiveness is of course to reach convergence within a reasonable number of iterations.

As we shall see, of major importance in an iterative scheme is therefore a procedure to accelerate convergence when slow convergence is observed. The fact that effective acceleration procedures have become available for many applications has rendered iterative methods very attractive.

In the next sections we first consider the classical Gauss-Seidel iterative procedure and then the conjugate gradient method. The Gauss-Seidel method was used in the early applications of the finite element method (see R. W. Clough and E. L. Wilson [A]) and continues to find use. However, the conjugate gradient method presented here is particularly attractive.

# 8.3.1 The Gauss-Seidel Method

Our objective is to calculate iteratively the solution to the equations $\mathbf{KU} = \mathbf{R}$ . Let $\mathbf{U}^{(1)}$ be an initial estimate for the displacements $\mathbf{U}$ . If no better value is known, $\mathbf{U}^{(1)}$ may be the null vector.

In the Gauss-Seidel iteration (see L. Seidel [A] and R. S. Varga [A]), we then evaluate for $s = 1, 2, \ldots$ :

$$
U _ {i} ^ {(s + 1)} = k _ {i i} ^ {- 1} \left(R _ {i} - \sum_ {j = 1} ^ {i - 1} k _ {i j} U _ {j} ^ {(s + 1)} - \sum_ {j = i + 1} ^ {n} k _ {i j} U _ {j} ^ {(s)}\right) \tag {8.65}
$$

where $U_{i}^{(s)}$ and $R_{i}$ are the $i$ th component of $\mathbf{U}$ and $\mathbf{R}$ and $s$ indicates the cycle of iteration. Alternatively, we may write in matrix form,

$$
\mathbf {U} ^ {(s + 1)} = \mathbf {K} _ {D} ^ {- 1} (\mathbf {R} - \mathbf {K} _ {L} \mathbf {U} ^ {(s + 1)} - \mathbf {K} _ {L} ^ {T} \mathbf {U} ^ {(s)}) \tag {8.66}
$$

where $\mathbf{K}_D$ is a diagonal matrix, $\mathbf{K}_D = \mathrm{diag}(k_{ii})$ , and $\mathbf{K}_L$ is a lower triangular matrix with the elements $k_{ij}$ such that

$$
\mathbf {K} = \mathbf {K} _ {L} + \mathbf {K} _ {D} + \mathbf {K} _ {L} ^ {T} \tag {8.67}
$$

The iteration is continued until the change in the current estimate of the displacement vector is small enough, i.e., until

$$
\frac {\left\| \mathbf {U} ^ {(s + 1)} - \mathbf {U} ^ {(s)} \right\| _ {2}}{\left\| \mathbf {U} ^ {(s + 1)} \right\| _ {2}} <   \epsilon \tag {8.68}
$$

where $\epsilon$ is the convergence tolerance. The number of iterations depends on the “quality” of the starting vector $\mathbf{U}^{(1)}$ and on the conditioning of the matrix K. But it is important to note that the iteration will always converge, provided that K is positive definite. Furthermore, the rate of convergence can be increased using overrelaxation, in which case the iteration is as follows:

$$
\mathbf {U} ^ {(s + 1)} = \mathbf {U} ^ {(s)} + \beta \mathbf {K} _ {D} ^ {- 1} (\mathbf {R} - \mathbf {K} _ {L} \mathbf {U} ^ {(s + 1)} - \mathbf {K} _ {D} \mathbf {U} ^ {(s)} - \mathbf {K} _ {L} ^ {T} \mathbf {U} ^ {(s)}) \tag {8.69}
$$

<!-- source-page: 765 -->

where $\beta$ is the overrelaxation factor. The optimum value of $\beta$ depends on the matrix K but is usually between 1.3 and 1.9.

EXAMPLE 8.22: Use the Gauss-Seidel iteration to solve the system of equations considered in Section 8.2.1.

The equations to be solved are

$$
\left[ \begin{array}{r r r r} 5 & - 4 & 1 & 0 \\ - 4 & 6 & - 4 & 1 \\ 1 & - 4 & 6 & - 4 \\ 0 & 1 & - 4 & 5 \end{array} \right] \left[ \begin{array}{l} U _ {1} \\ U _ {2} \\ U _ {3} \\ U _ {4} \end{array} \right] = \left[ \begin{array}{l} 0 \\ 1 \\ 0 \\ 0 \end{array} \right]
$$

In the solution we use (8.69), which in this case becomes

$$
\begin{array}{l} \left[ \begin{array}{l} U _ {1} \\ U _ {2} \\ U _ {3} \\ U _ {4} \end{array} \right] ^ {(s + 1)} = \left[ \begin{array}{l} U _ {1} \\ U _ {2} \\ U _ {3} \\ U _ {4} \end{array} \right] ^ {(s)} + \beta \left[ \begin{array}{c c c c} \frac {1}{5} & & & \\ & \frac {1}{6} & & \\ & & \frac {1}{6} & \\ & & & \frac {1}{5} \end{array} \right] \\ \times \left\{\left[ \begin{array}{l} 0 \\ 1 \\ 0 \\ 0 \end{array} \right] - \left[ \begin{array}{c c c c} 0 & & & \\ - 4 & 0 & & \\ 1 & - 4 & 0 & \\ 0 & 1 & - 4 & 0 \end{array} \right] \left[ \begin{array}{l} U _ {1} \\ U _ {2} \\ U _ {3} \\ U _ {4} \end{array} \right] ^ {(s + 1)} - \left[ \begin{array}{c c c c} 5 & - 4 & 1 & 0 \\ & 6 & - 4 & 1 \\ & & 6 & - 4 \\ & & & 5 \end{array} \right] \left[ \begin{array}{l} U _ {1} \\ U _ {2} \\ U _ {3} \\ U _ {4} \end{array} \right] ^ {(s)} \right\} \\ \end{array}
$$

We use as an initial guess

$$
\mathbf {U} ^ {(1)} = \left[ \begin{array}{c} 0 \\ 0 \\ 0 \\ 0 \end{array} \right]
$$

Consider first the solution without overrelaxation, i.e., $\beta = 1$ . We obtain

$$
\left[ \begin{array}{l} U _ {1} \\ U _ {2} \\ U _ {3} \\ U _ {4} \end{array} \right] ^ {(2)} = \left[ \begin{array}{l} 0 \\ 0. 1 6 7 \\ 0. 1 1 1 \\ 0. 0 5 5 6 \end{array} \right]; \quad \left[ \begin{array}{l} U _ {1} \\ U _ {2} \\ U _ {3} \\ U _ {4} \end{array} \right] ^ {(3)} = \left[ \begin{array}{l} 0. 1 1 1 \\ 0. 3 0 5 \\ 0. 2 2 2 \\ 0. 1 1 6 \end{array} \right]
$$

Using the convergence limit in (8.68) with $\epsilon = 0.001$ , we have convergence after 104 iterations and

$$
\left[ \begin{array}{l} U _ {1} \\ U _ {2} \\ U _ {3} \\ U _ {4} \end{array} \right] ^ {(1 0 4)} = \left[ \begin{array}{l} 1. 5 9 \\ 2. 5 9 \\ 2. 3 9 \\ 1. 3 9 \end{array} \right]
$$

with the exact solution being

$$
\left[ \begin{array}{l} U _ {1} \\ U _ {2} \\ U _ {3} \\ U _ {4} \end{array} \right] = \left[ \begin{array}{l} 1. 6 0 \\ 2. 6 0 \\ 2. 4 0 \\ 1. 4 0 \end{array} \right]
$$

<!-- source-page: 766 -->

We now vary $\beta$ and recalculate the solution. The following table gives the number of iterations required for convergence with $\epsilon = 0.001$ as a function of $\beta$ :

<table><tr><td>β</td><td>1.0</td><td>1.1</td><td>1.2</td><td>1.3</td><td>1.4</td><td>1.5</td><td>1.6</td><td>1.7</td><td>1.8</td><td>1.9</td></tr><tr><td>Number of iterations</td><td>104</td><td>88</td><td>74</td><td>61</td><td>49</td><td>37</td><td>23</td><td>30</td><td>43</td><td>82</td></tr></table>

Hence, for this example, we find that the minimum number of iterations is required at $\beta = 1.6$ .

It is instructive to identify the physical process that is followed in the solution procedure. For this purpose we note that on the right-hand side of (8.69) we evaluate an out-of-balance force corresponding to degree of freedom i,

$$
Q _ {i} ^ {(s)} = R _ {i} - \sum_ {j = 1} ^ {i - 1} k _ {i j} U _ {j} ^ {(s + 1)} - \sum_ {j = i} ^ {n} k _ {i j} U _ {j} ^ {(s)} \tag {8.70}
$$

and then calculate an improved value for the corresponding displacement component $U_{i}^{(s+1)}$ using

$$
U _ {i} ^ {(s + 1)} = U _ {i} ^ {(s)} + \beta k _ {i i} ^ {- 1} Q _ {i} ^ {(s)} \tag {8.71}
$$

where $i = 1, \ldots, n$ . Assuming that $\beta = 1$ , the correction to $U_{i}^{(s)}$ in (8.71) is calculated by applying the out-of-balance force $Q_{i}^{(s)}$ to the ith nodal degree of freedom, with all other nodal displacements kept fixed. The process is therefore identical to the moment-distribution procedure, which has been used extensively in hand calculation analysis of frames (see E. Lightfoot [A]). However, faster convergence is achieved if the acceleration factor $\beta$ is used.

In the above equations, we summed over all off-diagonal elements [see (8.65) and (8.70)]. However, in practice we would of course include in the summation only those degrees of freedom that correspond to nonzero entries in the stiffness matrix. As we pointed out earlier, only the nonzero matrix elements would be stored and be operated on.

# 8.3.2 Conjugate Gradient Method with Preconditioning

One of the most effective and simple iterative methods (when used with preconditioning) for solving $\mathbf{KU} = \mathbf{R}$ is the conjugate gradient algorithm of M. R. Hestenes and E. Stiefel [A] (see also J. K. Reid [A] and G. H. Golub and C. F. van Loan [A]).

The algorithm is based on the idea that the solution of $\mathbf{KU} = \mathbf{R}$ minimizes the total potential $\Pi = \frac{1}{2}\mathbf{U}^T\mathbf{KU} - \mathbf{U}^T\mathbf{R}$ [see (4.96) to (4.98)]. Hence, the task in the iteration is, given an approximation $\mathbf{U}^{(s)}$ to $\mathbf{U}$ for which the total potential is $\Pi^{(s)}$ , to find an improved approximation $\mathbf{U}^{(s + 1)}$ for which $\Pi^{(s + 1)} < \Pi^{(s)}$ . However, not only do we want the total potential to decrease in each iteration but we also want $\mathbf{U}^{(s + 1)}$ to be calculated efficiently and the decrease in the total potential to occur rapidly. Then the iteration will converge fast.

In the conjugate gradient method, we use in the $s$ th iteration the linearly independent vectors $\mathbf{p}^{(1)}, \mathbf{p}^{(2)}, \mathbf{p}^{(3)}, \ldots, \mathbf{p}^{(s)}$ and calculate the minimum of the total potential in the space spanned by these vectors. This gives $\mathbf{U}^{(s+1)}$ (see Exercise 8.23). Also, we establish the additional basis vector $\mathbf{p}^{(s+1)}$ used in the subsequent iteration.

<!-- source-page: 767 -->

The algorithm can be summarized as follows.

Choose the starting iteration vector $\mathbf{U}^{(1)}$ (frequently $\mathbf{U}^{(1)}$ is the null vector).

Calculate the residual $\mathbf{r}^{(1)} = \mathbf{R} - \mathbf{KU}^{(1)}$ . If $\mathbf{r}^{(1)} = \mathbf{0}$ , quit.

Else:

Set $\mathbf{p}^{(1)} = \mathbf{r}^{(1)}$ .

Calculate for $s = 1, 2, \ldots$ ,

$$
\alpha_ {s} = \frac {\mathbf {r} ^ {(s) ^ {T}} \mathbf {r} ^ {(s)}}{\mathbf {p} ^ {(s) ^ {T}} \mathbf {K p} ^ {(s)}}
$$

$$
\mathbf {U} ^ {(s + 1)} = \mathbf {U} ^ {(s)} + \alpha_ {s} \mathbf {p} ^ {(s)}
$$

$$
\mathbf {r} ^ {(s + 1)} = \mathbf {r} ^ {(s)} - \alpha_ {s} \mathbf {K p} ^ {(s)} \tag {8.72}
$$

$$
\beta_ {s} = \frac {\mathbf {r} ^ {(s + 1) ^ {T}} \mathbf {r} ^ {(s + 1)}}{\mathbf {r} ^ {(s) ^ {T}} \mathbf {r} ^ {(s)}}
$$

$$
\mathbf {p} ^ {(s + 1)} = \mathbf {r} ^ {(s + 1)} + \beta_ {s} \mathbf {p} ^ {(s)}
$$

We continue iterating until $\| \mathbf{r}^{(s)}\| \leq \epsilon$ , where $\epsilon$ is the convergence tolerance. A convergence criterion on $\| \mathbf{U}^{(s)}\|$ could also be used.

The conjugate gradient algorithm satisfies two important orthogonality properties regarding the direction vectors $\mathbf{p}^{(i)}$ and the residuals $\mathbf{r}^{(i)}$ , namely, we have (see Exercise 8.22)

$$
\mathbf {p} ^ {(i) ^ {T}} \mathbf {K} \mathbf {p} ^ {(j)} = 0 \quad \text { all   } i, j \text {   but   } i \neq j \tag {8.73}
$$

and $\mathbf{P}^{(j)^T}\mathbf{r}^{(j + 1)} = \mathbf{0}$ (8.74)

where $\mathbf{P}^{(j)} = [\mathbf{p}^{(1)},\dots ,\mathbf{p}^{(j)}]$ (8.75)

The orthogonality property in (8.73) means that the minimum of the total potential over the space spanned by $\mathbf{p}^{(1)},\ldots ,\mathbf{p}^{(s)}$ is obtained by using the minimum over the space spanned by $\mathbf{p}^{(1)},\ldots ,\mathbf{p}^{(s - 1)}$ (i.e., the solution of the previous iteration) and minimizing the total potential with respect to only a multiplier of $\mathbf{p}^{(s)}$ . This process gives $\alpha_{s}$ and the improved iterative solution $\mathbf{U}^{(s + 1)}$ .

The orthogonality in (8.74) means that the $(j + 1)$ st residual is orthogonal to all direction vectors used. This equation shows that convergence to the solution U will be reached and, in exact arithmetic, convergence is achieved in at most n iterations. Of course, in practice, we want convergence to be reached (to a reasonable convergence tolerance) in much fewer than n iterations.

The rate of convergence of the conjugate gradient algorithm depends on the condition number of the matrix K, defined as $\text{cond}(\mathbf{K}) = \lambda_{n}/\lambda_{1}$ , where $\lambda_{1}$ is the smallest eigenvalue and $\lambda_{n}$ is the largest eigenvalue of K (see Section 8.2.6). The larger the condition number, the slower the convergence, and in practice, when the matrix is ill-conditioned, convergence can be very slow. For this reason, the conjugate gradient algorithm as given in (8.72) is hardly effective.

At this point, we should also note that the properties above enumerated are valid only in exact arithmetic. In practice, because of the finite precision arithmetic, the orthogonality

<!-- source-page: 768 -->

properties in (8.73) and (8.74) are not exactly satisfied, but this loss of orthogonality is not detrimental.

To increase the rate of convergence of the solution algorithm, preconditioning is used. The basic idea is that instead of solving $\mathbf{KU} = \mathbf{R}$ , we solve

$$
\tilde {\mathbf {K}} \tilde {\mathbf {U}} = \tilde {\mathbf {R}} \tag {8.76}
$$

where

$$
\tilde {\mathbf {K}} = \mathbf {C} _ {L} ^ {- 1} \mathbf {K} \mathbf {C} _ {R} ^ {- 1}
$$

$$
\tilde {\mathbf {U}} = \mathbf {C} _ {R} \mathbf {U} \tag {8.77}
$$

$$
\tilde {\mathbf {R}} = \mathbf {C} _ {L} ^ {- 1} \mathbf {R}
$$

The nonsingular matrix $K_{p} = C_{L}C_{R}$ is called the preconditioner. The objective with this transformation is to obtain a matrix $\tilde{K}$ with a much improved condition number. Various preconditioners have been proposed (see G. H. Golub and C. F. van Loan [A], T. A. Manteuffel [A], and J. A. Meijerink and H. A. van der Vorst [A]), but one approach is particularly valuable, namely, using some incomplete Cholesky factors of K.

In this approach a reasonable matrix $K_{p}$ is obtained from inexact factors of K such that any equation solution with $K_{p}$ as coefficient matrix can be calculated very efficiently. In principle, many different incomplete Cholesky factors of K could be calculated. In one scheme, incomplete Cholesky factors of K are obtained by performing the usual factorization, as described in Section 8.2, but considering only and operating only on those locations where K has nonzero entries. Hence, all matrix elements below the skyline that originally are zero remain zero during the factorization and therefore need not be stored.

Instead of considering all initially nonzero elements in K, we may also decide to include in the factorization only those elements that are larger in magnitude than a certain threshold and set all other elements to zero. This approach leads to additional storage savings. Also, it can be effective to scale all diagonal elements in relation to the off-diagonal elements prior to performing the incomplete factorization, and of course we may use an exact factorization of certain submatrices in K to establish the incomplete factors (see Exercises 8.27 and 8.28). Clearly, there are many different possibilities, and various interesting relations between different approaches can be derived (see, for example, G. H. Golub and C. F. van Loan [A]).

Let $\overline{\mathbf{L}}$ and $\overline{\mathbf{L}}^T$ be the chosen incomplete Cholesky factors of $\mathbf{K}$ ; then in the preconditioning with these factors we use the matrix $\mathbf{K}_p = \overline{\mathbf{L}}\overline{\mathbf{L}}^T$ with $\mathbf{C}_L = \overline{\mathbf{L}}$ and $\mathbf{C}_R = \overline{\mathbf{L}}^T$ .

Whichever preconditioner $\mathbf{K}_p$ is employed, using the conjugate gradient algorithm of (8.72) for the problem in (8.76), we thus arrive at the following algorithm.

Choose the starting iteration vector $\mathbf{U}^{(1)}$ .

Calculate the residual $\mathbf{r}^{(1)} = \mathbf{R} - \mathbf{KU}^{(1)}$ . If $\mathbf{r}^{(1)} = \mathbf{0}$ , quit.

Else:

$$
\text { Calculate } \mathbf {z} ^ {(1)} = \mathbf {K} _ {p} ^ {- 1} \mathbf {r} ^ {(1)}.
$$

$$
\text { Set } \mathbf {p} ^ {(1)} = \mathbf {z} ^ {(1)}.
$$

<!-- source-page: 769 -->

Calculate for $s = 1, 2, \ldots$ , until a convergence tolerance is reached,

$$
\alpha_ {s} = \frac {\mathbf {z} ^ {(s) ^ {T}} \mathbf {r} ^ {(s)}}{\mathbf {p} ^ {(s) ^ {T}} \mathbf {K p} ^ {(s)}}
$$

$$
\mathbf {U} ^ {(s + 1)} = \mathbf {U} ^ {(s)} + \alpha_ {s} \mathbf {p} ^ {(s)}
$$

$$
\mathbf {r} ^ {(s + 1)} = \mathbf {r} ^ {(s)} - \alpha_ {s} \mathbf {K p} ^ {(s)}
$$

$$
\mathbf {z} ^ {(s + 1)} = \mathbf {K} _ {p} ^ {- 1} \mathbf {r} ^ {(s + 1)} \tag {8.78}
$$

$$
\beta_ {s} = \frac {\mathbf {z} ^ {(s + 1) ^ {T}} \mathbf {r} ^ {(s + 1)}}{\mathbf {z} ^ {(s) ^ {T}} \mathbf {r} ^ {(s)}}
$$

$$
\mathbf {p} ^ {(s + 1)} = \mathbf {z} ^ {(s + 1)} + \beta_ {s} \mathbf {p} ^ {(s)}
$$

In this iteration we define an intermediate vector $\mathbf{z}^{(s)}$ , which is equal to $\mathbf{r}^{(s)}$ if no preconditioning is used (i.e., when $K_{p} = I$ ). Of course, the matrix $K_{p}$ need not be calculated, but rather $\mathbf{z}^{(s+1)}$ is directly computed using $\mathbf{r}^{(s+1)}$ . Also, all multiplications with K are performed by operating on only the nonzero elements in the matrix. Note that the iteration in (8.78) reduces to the iteration in (8.72) when no preconditioning is employed and that convergence would be immediate if $K_{p}$ were equal to K.

We assume in this iteration that $K_{p}$ is nonsingular. In practice, this condition is usually met, and if not, we may slightly perturb the coefficient matrix (or rather its factors) so as to be able to solve for $\mathbf{z}^{(s)}$ .

While we still cannot predict how fast convergence will be achieved in the iteration, practical experience with the iterative scheme in (8.78) has shown that very significant savings in required storage and computer time are frequently realized (see K. J. Bathe, J. Walczak, and H. Zhang [A]). Of course, the required number of iterations depends on the structure of the matrix K, its condition number, the details of the preconditioning used, and the accuracy to be achieved.

We considered in this section only the case of a symmetric coefficient matrix, and should note that also of much interest is the iterative solution of equations with nonsymmetric coefficient matrices (as we encounter in the analysis of fluid flows). Here the benefits of savings in storage and computing time can be even more significant. For nonsymmetric coefficient matrices, the conjugate gradient method has been generalized and other iterative schemes, notably, the generalized minimal residual (GMRes) method, have been developed and researched (see, for example, R. Fletcher [A], Y. Saad and M. H. Schultz [A], and Y. Saad [A].

Finally, we should also recognize that there is the possibility of combining the direct and iterative solution schemes discussed above. As an example, the substructuring procedures described in Section 8.2.4 might be used to assemble governing equations (after static condensation of the internal degrees of freedom of the substructures) that are solved using conjugate gradient iteration. Such combinations can lead to a variety of procedures that may display considerable advantage in specific applications.

# 8.3.3 Exercises

8.21. Solve the system of equations given using the Gauss-Seidel iterative method. Use the overrelaxation factor $\beta$ , and study the convergence properties as $\beta$ is varied from 1.0 to 2.0.

<!-- source-page: 770 -->

$$
\left[ \begin{array}{r r r} 3 & - 1 & 0 \\ - 1 & 2 & - 1 \\ 0 & - 1 & 1 \end{array} \right] \left[ \begin{array}{l} U _ {1} \\ U _ {2} \\ U _ {3} \end{array} \right] = \left[ \begin{array}{l} 0 \\ 1 \\ 0 \end{array} \right]
$$

8.22. Show that the orthogonality properties (8.73) and (8.74) hold in the conjugate gradient iteration technique (using exact arithmetic).
8.23. Show that with the conjugate gradient algorithm in (8.72), the minimum of $\Pi$ in the space spanned by the vectors $\mathbf{p}^{(1)}, \ldots, \mathbf{p}^{(s)}$ is obtained by using the minimum of $\Pi$ in the space spanned by $\mathbf{p}^{(1)}, \ldots, \mathbf{p}^{(s-1)}$ and the solution for $\alpha_{s}$ given by the algorithm.
8.24. Derive the preconditioned incomplete Cholesky conjugate gradient algorithm in (8.78) from the standard algorithm in (8.72).
8.25. Solve the system of equations in Exercise 8.21 using the conjugate gradient algorithm (without preconditioning).
8.26. Solve the system of equations in Exercise 8.21 using the conjugate gradient method with preconditioning. Use the following preconditioner:

$$
\mathbf {K} _ {p} = \left[ \begin{array}{c c c} 3 & & \\ & 2 & \\ & & 1 \end{array} \right]
$$

8.27. Consider the following system of equations:

$$
\left[ \begin{array}{c c c c} 5 & - 1 & - 1 & - 1 \\ - 1 & 2 & 0 & 0 \\ - 1 & 0 & 2 & 0 \\ - 1 & 0 & 0 & 2 \end{array} \right] \left[ \begin{array}{l} U _ {1} \\ U _ {2} \\ U _ {3} \\ U _ {4} \end{array} \right] = \left[ \begin{array}{l} 0 \\ 0 \\ 0 \\ 1 \end{array} \right]
$$

(a) Solve the equations using the preconditioned conjugate gradient algorithm with $\mathbf{K}_p$ corresponding to the incomplete Cholesky factors that are obtained by performing the factorization on $\mathbf{K}$ as usual but ignoring all zero elements.
(b) Solve the equations using the preconditioned conjugate gradient algorithm with the preconditioner

$$
\mathbf {K} _ {p} = \left[ \begin{array}{c c c c} 5 & - 1 & & \\ - 1 & 2 & & \\ & & 2 & \\ & & & 2 \end{array} \right]
$$

8.28. Consider the simply supported beam problem in Fig. 8.1, governed by the equations,

$$
\left[ \begin{array}{r r r r} 5 & - 4 & 1 & 0 \\ - 4 & 6 & - 4 & 1 \\ 1 & - 4 & 6 & - 4 \\ 0 & 1 & - 4 & 5 \end{array} \right] \left[ \begin{array}{l} U _ {1} \\ U _ {2} \\ U _ {3} \\ U _ {4} \end{array} \right] = \left[ \begin{array}{l} 0 \\ 1 \\ 0 \\ 0 \end{array} \right]
$$

Solve this set of equations by the conjugate gradient method using two different preconditioners that you shall propose.