Files
MultiPhysicsVault/.raw/FiniteElementProcedures/FiniteElementProcedures_008.md
T
김경종 4cc312954f
Tests / Hermetic test suite (push) Has been cancelled
Tests / Skill frontmatter validation (push) Has been cancelled
add wiki
2026-05-28 17:16:48 +09:00

32 KiB

where p(\lambda) is given in (2.85). Hence the characteristic polynomials of the problems A\mathbf{v} = \lambda\mathbf{v} and \tilde{A}\tilde{v} = \lambda B\tilde{v} are the same within a multiplier. This means that the eigenvalues of the two problems are identical.

So far we have shown that there are n eigenvalues and corresponding eigenvectors, but we have not yet discussed the properties of the eigenvalues and vectors.

A first observation is that the eigenvalues are real. Consider the ith eigenpair (\lambda_{i}, \mathbf{v}_{i}) , for which we have


\mathbf {A} \mathbf {v} _ {i} = \lambda_ {i} \mathbf {v} _ {i} \tag {2.94}

Assume that \mathbf{v}_i and \lambda_i are complex, which includes the case of real eigenvalues, and let the elements of \overline{\mathbf{v}}_i and \overline{\lambda}_i be the complex conjugates of the elements of \mathbf{v}_i and \lambda_i . Then premultiplying (2.94) by \overline{\mathbf{v}}_i^T , we obtain


\overline {{{\mathbf {v}}}} _ {i} ^ {T} \mathbf {A} \mathbf {v} _ {i} = \lambda_ {i} \overline {{{\mathbf {v}}}} _ {i} ^ {T} \mathbf {v} _ {i} \tag {2.95}

On the other hand, we also obtain from (2.94),


\overline {{{\mathbf {v}}}} _ {i} ^ {T} \mathbf {A} = \overline {{{\mathbf {v}}}} _ {i} ^ {T} \overline {{{\boldsymbol {\lambda}}}} _ {i} \tag {2.96}

and postmultiplying by \mathbf{v}_i , we have


\overline {{{\mathbf {v}}}} _ {i} ^ {T} \mathbf {A} \mathbf {v} _ {i} = \overline {{{\boldsymbol {\lambda}}}} _ {i} \overline {{{\mathbf {v}}}} _ {i} ^ {T} \mathbf {v} _ {i} \tag {2.97}

But the left-hand sides of (2.95) and (2.97) are the same, and thus we have


(\lambda_ {i} - \overline {{{\lambda}}} _ {i}) \overline {{{\mathbf {v}}}} _ {i} ^ {T} \mathbf {v} _ {i} = 0 \tag {2.98}

Since v_{i} is nontrivial, it follows that \lambda_{i} = \bar{\lambda}_{i} , and hence the eigenvalue must be real. However, it then also follows from (2.83) that the eigenvectors can be made real because the coefficient matrix A - \lambda I is real.

Another important point is that the eigenvectors that correspond to distinct eigenvalues are unique (within scalar multipliers) and orthogonal, whereas the eigenvectors corresponding to multiple eigenvalues are not unique, but we can always choose an orthogonal set.

Assume first that the eigenvalues are distinct. In this case we have for two eigenpairs,


\mathbf {A} \mathbf {v} _ {i} = \lambda_ {i} \mathbf {v} _ {i} \tag {2.99}

and \mathbf{A}\mathbf{v}_j = \lambda_j\mathbf{v}_j (2.100)

Premultiplying (2.99) by \mathbf{v}_j^T and (2.100) by \mathbf{v}_i^T , we obtain


\mathbf {v} _ {j} ^ {T} \mathbf {A} \mathbf {v} _ {i} = \lambda_ {i} \mathbf {v} _ {j} ^ {T} \mathbf {v} _ {i} \tag {2.101}

\mathbf {v} _ {i} ^ {T} \mathbf {A} \mathbf {v} _ {j} = \lambda_ {j} \mathbf {v} _ {i} ^ {T} \mathbf {v} _ {j} \tag {2.102}

Taking the transpose in (2.102), we have


\mathbf {v} _ {j} ^ {T} \mathbf {A} \mathbf {v} _ {i} = \lambda_ {j} \mathbf {v} _ {j} ^ {T} \mathbf {v} _ {i} \tag {2.103}

and thus from (2.103) and (2.101) we obtain


(\lambda_ {i} - \lambda_ {j}) \mathbf {v} _ {j} ^ {T} \mathbf {v} _ {i} = 0 \tag {2.104}

Since we assumed that \lambda_{i} \neq \lambda_{j} , it follows that \mathbf{v}_{j}^{T} \mathbf{v}_{i} = 0 , i.e., that \mathbf{v}_{j} and \mathbf{v}_{i} are orthogonal.

Furthermore, we can scale the elements of the vector v_{i} to obtain


\mathbf {v} _ {i} ^ {T} \mathbf {v} _ {j} = \delta_ {i j} \tag {2.105}

where \delta_{ij} = the Kronecker delta; i.e., \delta_{ij} = 1 when i = j , and \delta_{ij} = 0 when i \neq j . If (2.105) is satisfied, we say that the eigenvectors are orthonormal.

It should be noted that the solution of (2.83) yields a vector in which only the relative magnitudes of the elements are defined. If all elements are scaled by the same amount, the new vector would still satisfy (2.83) . In effect, the solution of (2.83) yields the direction of the eigenvector, and we use the orthonormality condition in (2.105) to fix the magnitudes of the elements in the vector. Therefore, when we refer to eigenvectors from now on it is implied that the vectors are orthonormal.

EXAMPLE 2.30: Check that the vectors calculated in Example 2.29 are orthogonal and then orthonormalize them.

The orthogonality is checked by forming v_{1}^{T}v_{2} , which gives


\mathbf {v} _ {1} ^ {T} \mathbf {v} _ {2} = (2) (\frac {1}{2}) + (- 1) (1) = 0

Hence the vectors are orthogonal. To orthonormalize the vectors, we need to make the lengths of the vectors equal to 1. Then we have


\mathbf {v} _ {1} = \frac {1}{\sqrt {5}} \left[ \begin{array}{l} 2 \\ - 1 \end{array} \right] \quad \text { or } \quad \mathbf {v} _ {1} = \frac {1}{\sqrt {5}} \left[ \begin{array}{l} - 2 \\ 1 \end{array} \right]; \quad \mathbf {v} _ {2} = \frac {1}{\sqrt {5}} \left[ \begin{array}{l} 1 \\ 2 \end{array} \right] \quad \text { or } \quad \mathbf {v} _ {2} = \frac {1}{\sqrt {5}} \left[ \begin{array}{l} - 1 \\ - 2 \end{array} \right]

We now turn to the case in which multiple eigenvalues are also present. The proof of eigenvector orthonormality given in (2.99) to (2.105) is not possible because for a multiple eigenvalue, \lambda_{i} is equal to \lambda_{j} in (2.104). Assume that \lambda_{i} = \lambda_{i+1} = \cdots = \lambda_{i+m-1} ; i.e., \lambda_{i} is an m-times multiple root. Then we can show that it is still always possible to choose m orthonormal eigenvectors that correspond to \lambda_{i}, \lambda_{i+1}, \ldots, \lambda_{i+m-1} . This follows because for a symmetric matrix of order n, we can always establish a complete set of n orthonormal eigenvectors. Corresponding to each distinct eigenvalue we have an eigenspace with dimension equal to the multiplicity of the eigenvalue. All eigenspaces are unique and are orthogonal to the eigenspaces that correspond to other distinct eigenvalues. The eigenvectors associated with an eigenvalue provide a basis for the eigenspace, and since the basis is not unique if m > 1, the eigenvectors corresponding to a multiple eigenvalue are not unique. The formal proofs of these statements are an application of the principles discussed earlier and are given in the following examples.

EXAMPLE 2.31: Show that for a symmetric matrix A of order n, there are always n orthonormal eigenvectors.

Assume that we have calculated an eigenvalue \lambda_{i} and corresponding eigenvector \mathbf{v}_{i} . Let us construct an orthonormal matrix \mathbf{Q} whose first column is \mathbf{v}_{i} ,


\mathbf {Q} = [ \mathbf {v} _ {i} \quad \hat {\mathbf {Q}} ]; \quad \mathbf {Q} ^ {T} \mathbf {Q} = \mathbf {I}

This matrix can always be constructed because the vectors in \mathbf{Q} provide an orthonormal basis for the n -dimensional space in which \mathbf{A} is defined. However, we can now calculate


\mathbf {Q} ^ {T} \mathbf {A} \mathbf {Q} = \left[ \begin{array}{l l} \lambda_ {i} & \mathbf {0} \\ \mathbf {0} & \mathbf {A} _ {1} \end{array} \right] \tag {a}

where


\mathbf {A} _ {1} = \hat {\mathbf {Q}} ^ {T} \mathbf {A} \hat {\mathbf {Q}}

and \mathbf{A}_1 is a full matrix of order (n - 1) . If n = 2 , we note that \mathbf{Q}^T\mathbf{A}\mathbf{Q} is diagonal. In that case, if we premultiply (a) by \mathbf{Q} and let a \equiv \mathbf{A}_1 we obtain


\mathbf {A Q} = \mathbf {Q} \left[ \begin{array}{l l} \lambda_ {i} & 0 \\ 0 & a \end{array} \right]

and hence the vector in \hat{\mathbf{Q}} is the other eigenvector and a is the other eigenvalue regardless of whether \lambda_{i} is a multiple eigenvalue or not.

The complete proof is now obtained by induction. Assume that the statement is true for a matrix of order (n - 1) ; then we will show that it is also true for a matrix of order n. But since we demonstrated that the statement is true for n = 2, it follows that it is true for any n.

The assumption that there are (n - 1) orthonormal eigenvectors for a matrix of order (n - 1) gives


\mathbf {Q} _ {1} ^ {T} \mathbf {A} _ {1} \mathbf {Q} _ {1} = \boldsymbol {\Lambda} \tag {b}

where Q_{1} is a matrix of the eigenvectors of A_{1} and \Lambda is a digonal matrix listing the eigenvalues of A_{1} . However, if we now define


\mathbf {S} = \left[ \begin{array}{l l} 1 & \mathbf {0} \\ \mathbf {0} & \mathbf {Q} _ {1} \end{array} \right]

we have \mathbf{S}^T\mathbf{Q}^T\mathbf{A}\mathbf{Q}\mathbf{S} = \begin{bmatrix} \lambda_i & \mathbf{0}\\ \mathbf{0} & \boldsymbol {\Lambda} \end{bmatrix} (c)

Let \mathbf{P} = \mathbf{Q}\mathbf{S};\quad \mathbf{P}^T\mathbf{P} = \mathbf{I}

Then premultiplying (c) by P, we obtain


\mathbf {A P} = \mathbf {P} \left[ \begin{array}{l l} \lambda_ {i} & \mathbf {0} \\ \mathbf {0} & \boldsymbol {\Lambda} \end{array} \right]

Therefore, under the assumption in (b), the statement is also true for a matrix of order n, which completes the proof.

EXAMPLE 2.32: Show that the eigenvectors corresponding to a multiple eigenvalue of multiplicity m define an m-dimensional space in which each vector is also an eigenvector. This space is called the eigenspace corresponding to the eigenvalue considered.

Let \lambda_{i} be the eigenvalue of multiplicity m ; i.e., we have


\lambda_ {i} = \lambda_ {i + 1} = \dots = \lambda_ {i + m - 1}

We showed in Example 2.31 that there are m orthonormal eigenvectors \mathbf{v}_i, \mathbf{v}_{i+1}, \ldots, \mathbf{v}_{i+m-1} corresponding to \lambda_i . These vectors provide the basis of an m -dimensional space. Consider any vector \mathbf{w} in this space, such as


\mathbf {w} = \alpha_ {i} \mathbf {v} _ {i} + \alpha_ {i + 1} \mathbf {v} _ {i + 1} + \dots + \alpha_ {i + m - 1} \mathbf {v} _ {i + m - 1}

where the \alpha_{i},\alpha_{i + 1},\ldots , are constants. The vector \mathbf{w} is also an eigenvector because we have


\mathbf {A} \mathbf {w} = \alpha_ {i} \mathbf {A} \mathbf {v} _ {i} + \alpha_ {i + 1} \mathbf {A} \mathbf {v} _ {i + 1} + \dots + \alpha_ {i + m - 1} \mathbf {A} \mathbf {v} _ {i + m - 1}

which gives


\mathbf {A} \mathbf {w} = \alpha_ {i} \lambda_ {i} \mathbf {v} _ {i} + \alpha_ {i + 1} \lambda_ {i} \mathbf {v} _ {i + 1} + \dots + \alpha_ {i + m - 1} \lambda_ {i} \mathbf {v} _ {i + m - 1} = \lambda_ {i} \mathbf {w}

Therefore, any vector \mathbf{w} in the space spanned by the m eigenvectors \mathbf{v}_i, \mathbf{v}_{i+1}, \ldots, \mathbf{v}_{i+m-1} is also an eigenvector. It should be noted that the vector \mathbf{w} will be orthogonal to the eigenvectors that

correspond to eigenvalues not equal to \lambda_{i} . Hence there is one eigenspace that corresponds to each, distinct or multiple, eigenvalue. The dimension of the eigenspace is equal to the multiplicity of the eigenvalue.

Now that the main properties of the eigenvalues and eigenvectors of A have been presented, we can write the n solutions to A v = \lambda v in various forms. First, we have


\mathbf {A} \mathbf {V} = \mathbf {V} \boldsymbol {\Lambda} \tag {2.106}

where V is a matrix storing the eigenvectors, V = [v_{1}, \ldots, v_{n}] , and \Lambda is a diagonal matrix with the corresponding eigenvalues on its diagonal, \Lambda = \text{diag}(\lambda_{i}) . Using the orthonormality property of the eigenvectors (i.e., V^{T}V = I ), we obtain from (2.106),


\mathbf {V} ^ {T} \mathbf {A} \mathbf {V} = \boldsymbol {\Lambda} \tag {2.107}

Furthermore, we obtain the spectral decomposition of A,


\mathbf {A} = \mathbf {V} \mathbf {\Lambda} \mathbf {V} ^ {T} \tag {2.108}

where it may be convenient to write the spectral decomposition of A as


\mathbf {A} = \sum_ {i = 1} ^ {n} \lambda_ {i} \mathbf {v} _ {i} \mathbf {v} _ {i} ^ {T} \tag {2.109}

It should be noted that each of these equations represents the solution to the eigenproblem A v = \lambda v . Consider the following example.

EXAMPLE 2.33: Establish the relations given in (2.106) to (2.109) for the matrix A used in Example 2.29.

The eigenvalues and eigenvectors of A have been calculated in Examples 2.29 and 2.30. Using the information given in these examples, we have for (2.106),


\left[ \begin{array}{l l} - 1 & 2 \\ 2 & 2 \end{array} \right] \left[ \begin{array}{c c} - \frac {2}{\sqrt {5}} & \frac {1}{\sqrt {5}} \\ \frac {1}{\sqrt {5}} & \frac {2}{\sqrt {5}} \end{array} \right] = \left[ \begin{array}{c c} - \frac {2}{\sqrt {5}} & \frac {1}{\sqrt {5}} \\ \frac {1}{\sqrt {5}} & \frac {2}{\sqrt {5}} \end{array} \right] \left[ \begin{array}{l l} - 2 & 0 \\ 0 & 3 \end{array} \right]

for (2.107), \left[ \begin{array}{ll} -\frac{2}{\sqrt{5}} & \frac{1}{\sqrt{5}}\\ \frac{1}{\sqrt{5}} & \frac{2}{\sqrt{5}} \end{array} \right]\left[ \begin{array}{ll} -1 & 2\\ 2 & 2 \end{array} \right]\left[ \begin{array}{ll} -\frac{2}{\sqrt{5}} & \frac{1}{\sqrt{5}}\\ \frac{1}{\sqrt{5}} & \frac{2}{\sqrt{5}} \end{array} \right] = \left[ \begin{array}{ll} -2 & 0\\ 0 & 3 \end{array} \right]

for (2.108), \begin{bmatrix}-1 & 2 \\ 2 & 2\end{bmatrix} = \begin{bmatrix}-\frac{2}{\sqrt{5}} & \frac{1}{\sqrt{5}} \\ \frac{1}{\sqrt{5}} & \frac{2}{\sqrt{5}}\end{bmatrix}\begin{bmatrix}-2 & 0 \\ 0 & 3\end{bmatrix}\begin{bmatrix}-\frac{2}{\sqrt{5}} & \frac{1}{\sqrt{5}} \\ \frac{1}{\sqrt{5}} & \frac{2}{\sqrt{5}}\end{bmatrix}

and for (2.109),


\mathbf {A} = (- 2) \left[ \begin{array}{c} - \frac {2}{\sqrt {5}} \\ \frac {1}{\sqrt {5}} \end{array} \right] \left[ \begin{array}{l l} - \frac {2}{\sqrt {5}} & \frac {1}{\sqrt {5}} \end{array} \right] + (3) \left[ \begin{array}{c} \frac {1}{\sqrt {5}} \\ \frac {2}{\sqrt {5}} \end{array} \right] \left[ \begin{array}{l l} \frac {1}{\sqrt {5}} & \frac {2}{\sqrt {5}} \end{array} \right]

The relations in (2.107) and (2.108) can be employed effectively in various important applications. The objective in the following examples is to present some solution procedures in which they are used.

EXAMPLE 2.34: Calculate the kth power of a given matrix A; i.e., evaluate A^{k} . Demonstrate the result using A in Example 2.29.

One way of evaluating A^{k} is to simply calculate A^{2} = AA , A^{4} = A^{2}A^{2} , etc. However, if k is large, it may be more effective to employ the spectral decomposition of A. Assume that we have calculated the eigenvalues and eigenvectors of A; i.e., we have


\mathbf {A} = \mathbf {V} \boldsymbol {\Lambda} \mathbf {V} ^ {T}

To calculate \mathbf{A}^2 , we use


\mathbf {A} ^ {2} = \mathbf {V} \boldsymbol {\Lambda} \mathbf {V} ^ {T} \mathbf {V} \boldsymbol {\Lambda} \mathbf {V} ^ {T}

but because \mathbf{V}^T\mathbf{V} = \mathbf{I} , we have


\mathbf {A} ^ {2} = \mathbf {V} \boldsymbol {\Lambda} ^ {2} \mathbf {V} ^ {T}

Proceeding in the same manner, we thus obtain


\mathbf {A} ^ {k} = \mathbf {V} \boldsymbol {\Lambda} ^ {k} \mathbf {V} ^ {T}

As an example, let \mathbf{A} be the matrix considered in Example 2.29. Then we have


\mathbf {A} ^ {k} = \frac {1}{\sqrt {5}} \left[ \begin{array}{c c} - 2 & 1 \\ 1 & 2 \end{array} \right] \left[ \begin{array}{c c} (- 2) ^ {k} & 0 \\ 0 & (3) ^ {k} \end{array} \right] \frac {1}{\sqrt {5}} \left[ \begin{array}{c c} - 2 & 1 \\ 1 & 2 \end{array} \right]

or \mathbf{A}^k = \frac{1}{5}\left[\frac{(-2)^{k + 2}}{(-2)^{k + 1}} +\frac{(3)^k}{(2)(3)^k}\right]\frac{(-2)^{k + 1} + (2)(3)^k}{(-2)^k + (4)(3)^k}

It is interesting to note that if the largest absolute value of all the eigenvalues of A is smaller than 1, we have A^{k} \rightarrow 0 as k \rightarrow \infty . Thus, defining the spectral radius of A,


\rho (\mathbf {A}) = \max _ {\text { all } i} | \lambda_ {i} |

we have \lim_{k\to \infty}\mathbf{A}^k = \mathbf{0} , provided that \rho (\mathbf{A}) < 1 .

EXAMPLE 2.35: Consider the system of differential equations


\dot {\mathbf {x}} + \mathbf {A} \mathbf {x} = \mathbf {f} (t) \tag {a}

and obtain the solution using the spectral decomposition of A. Demonstrate the result using the matrix A in Example 2.29 and


\mathbf {f} (t) = \left[ \begin{array}{c} e ^ {- t} \\ 0 \end{array} \right]; \quad \mathbf {^ {0} x} = \left[ \begin{array}{c} 1 \\ 1 \end{array} \right]

where ^{0}x are the initial conditions.

Substituting \mathbf{A} = \mathbf{V}\mathbf{\Lambda}\mathbf{V}^T and premultiplying by \mathbf{V}^T , we obtain


\mathbf {V} ^ {T} \dot {\mathbf {x}} + \mathbf {\Lambda} (\mathbf {V} ^ {T} \mathbf {x}) = \mathbf {V} ^ {T} \mathbf {f} (t)

Thus if we define \mathbf{y} = \mathbf{V}^T\mathbf{x} , we need to solve the equations


\dot {\mathbf {y}} + \mathbf {\Lambda} \mathbf {y} = \mathbf {V} ^ {T} \mathbf {f} (t)

But this is a set of n decoupled differential equations. Consider the rth equation, which is typical:


\dot {y} _ {r} + \lambda_ {r} y _ {r} = \mathbf {v} _ {r} ^ {T} \mathbf {f} (t)

The solution is y_{r}(t) = ^{0}y_{r}e^{-\lambda_{r}t} + e^{-\lambda_{r}t}\int_{0}^{t}e^{\lambda_{r}\tau}\mathbf{v}_{r}^{T}\mathbf{f}(\tau)d\tau

where ^0 y_r is the value of y_r at time t = 0 . The complete solution to the system of equations in (a) is


\mathbf {x} = \sum_ {r = 1} ^ {n} \mathbf {v} _ {r} \mathbf {y} _ {r} \tag {b}

As an example, we consider the system of differential equations


\left[ \begin{array}{l} \dot {x} _ {1} \\ \dot {x} _ {2} \end{array} \right] + \left[ \begin{array}{c c} - 1 & 2 \\ 2 & 2 \end{array} \right] \left[ \begin{array}{l} x _ {1} \\ x _ {2} \end{array} \right] = \left[ \begin{array}{l} e ^ {- t} \\ 0 \end{array} \right]

In this case we have to solve the two decoupled differential equations


\dot {y} _ {1} + (- 2) y _ {1} = 2 e ^ {- t}

\dot {y} _ {2} + 3 y _ {2} = e ^ {- t}

with initial conditions


{ } ^ { 0 } \mathbf { y } = \mathbf { V } ^ { T 0 } \mathbf { x } = \frac { 1 } { \sqrt { 5 } } \left[ \begin{array} { c c } 2 & - 1 \\ 1 & 2 \end{array} \right] \left[ \begin{array} { c } 1 \\ 1 \end{array} \right] = \frac { 1 } { \sqrt { 5 } } \left[ \begin{array} { c } 1 \\ 3 \end{array} \right]

We obtain y_{1} = \frac{1}{\sqrt{5}} e^{2t} - \frac{2}{3} e^{-t}


y _ {2} = \frac {3}{\sqrt {5}} e ^ {- 3 t} + \frac {1}{2} e ^ {- t}

Thus, using (b), we have


\left[ \begin{array}{l} x _ {1} \\ x _ {2} \end{array} \right] = \frac {1}{\sqrt {5}} \left(\left[ \begin{array}{l} 2 \\ - 1 \end{array} \right] y _ {1} + \left[ \begin{array}{l} 1 \\ 2 \end{array} \right] y _ {2}\right)

= \left[ \begin{array}{c} - \frac {\sqrt {5}}{6} e ^ {- t} + \frac {3}{5} e ^ {- 3 t} + \frac {2}{5} e ^ {2 t} \\ \frac {\sqrt {5}}{3} e ^ {- t} + \frac {6}{5} e ^ {- 3 t} - \frac {1}{5} e ^ {2 t} \end{array} \right]

To conclude the presentation, we may note that by introducing auxiliary variables, higher-order differential equations can be reduced to a system of first-order differential equations. However, the coefficient matrix A is in that case nonsymmetric.

EXAMPLE 2.36: Using the spectral decomposition of an n \times n symmetric matrix \mathbf{A} , evaluate the inverse of the matrix. Demonstrate the result using the matrix \mathbf{A} in Example 2.29.

Assume that we have evaluated the eigenvalues \lambda_{i} and corresponding eigenvectors \mathbf{v}_{i} , i = 1, \ldots, n , of the matrix \mathbf{A} ; i.e., we have solved the eigenproblem


\mathbf {A} \mathbf {v} = \lambda \mathbf {v} \tag {a}

Premultiplying both sides of (a) by \lambda^{-1}\mathbf{A}^{-1} , we obtain the eigenproblem


\mathbf {A} ^ {- 1} \mathbf {v} = \lambda^ {- 1} \mathbf {v}

But this relation shows that the eigenvalues of \mathbf{A}^{-1} are 1 / \lambda_{i} and the eigenvectors are \mathbf{v}_i , i = 1, \ldots, n . Thus using (2.109) for \mathbf{A}^{-1} , we have


\mathbf {A} ^ {- 1} = \mathbf {V} \mathbf {\Lambda} ^ {- 1} \mathbf {V} ^ {T}

or


\mathbf {A} ^ {- 1} = \sum_ {i = 1} ^ {n} \left(\frac {1}{\lambda_ {i}}\right) \mathbf {v} _ {i} \mathbf {v} _ {i} ^ {T}

These equations show that we cannot find the inverse of A if the matrix has a zero eigenvalue.

As an example, we evaluate the inverse of the matrix \mathbf{A} considered in Example 2.29. In this case we have


\mathbf {A} ^ {- 1} = \frac {1}{5} \left[ \begin{array}{l l} 2 & 1 \\ - 1 & 2 \end{array} \right] \left[ \begin{array}{l l} - \frac {1}{2} & 0 \\ 0 & \frac {1}{3} \end{array} \right] \left[ \begin{array}{l l} 2 & - 1 \\ 1 & 2 \end{array} \right] = \frac {1}{6} \left[ \begin{array}{l l} - 2 & 2 \\ 2 & 1 \end{array} \right]

The key point of the transformation (2.107) is that in (2.107) we perform a change of basis [see (2.86) and (2.88)]. Since the vectors in V correspond to a new basis, they span the n-dimensional space in which A and \Lambda are defined, and any vector w can be expressed as a linear combination of the eigenvectors v_{i} ; i.e., we have


\mathbf {w} = \sum_ {i = 1} ^ {n} \alpha_ {i} \mathbf {v} _ {i} \tag {2.110}

An important observation is that \Lambda shows directly whether the matrices A and \Lambda are singular. Using the definition given in Section 2.2, we find that \Lambda and hence A are singular if and only if an eigenvalue is equal to zero, because in that case \Lambda^{-1} cannot be calculated. In this context it is useful to define some additional terminology. If all eigenvalues are positive, we say that the matrix is positive definite. If all eigenvalues are greater than or equal to zero, the matrix is positive semidefinite; with negative, zero, or positive eigenvalues, the matrix is indefinite.

2.6 THE RAYLEIGH QUOTIENT AND THE MINIMAX CHARACTERIZATION OF EIGENVALUES

In the previous section we defined the eigenproblem A v = \lambda v and discussed the basic properties that pertain to the solutions of the problem. The objective in this section is to complement the information given with some very powerful principles.

A number of important principles are derived using the Rayleigh quotient \rho(\mathbf{v}) , which is defined as


\rho (\mathbf {v}) = \frac {\mathbf {v} ^ {T} \mathbf {A} \mathbf {v}}{\mathbf {v} ^ {T} \mathbf {v}} \tag {2.111}

The first observation is that


\lambda_ {1} \leq \rho (\mathbf {v}) \leq \lambda_ {n} \tag {2.112}

and it follows that using the definitions given in Section 2.5, we have for any vector \mathbf{v} , if \mathbf{A} is positive definite \rho(\mathbf{v}) > 0 , if \mathbf{A} is positive semidefinite \rho(\mathbf{v}) \geq 0 , and for \mathbf{A} indefinite \rho(\mathbf{v})

can be negative, zero, or positive. For the proof of (2.112) we use


\mathbf {v} = \sum_ {i = 1} ^ {n} \alpha_ {i} \mathbf {v} _ {i} \tag {2.113}

where v_{i} are the eigenvectors of A. Substituting for v into (2.111) and using that Av_{i} = \lambda_{i}v_{i} , v_{i}^{T}v_{j} = \delta_{ij} , we obtain


\rho (\mathbf {v}) = \frac {\lambda_ {1} \alpha_ {1} ^ {2} + \lambda_ {2} \alpha_ {2} ^ {2} + \cdots + \lambda_ {n} \alpha_ {n} ^ {2}}{\alpha_ {1} ^ {2} + \cdots + \alpha_ {n} ^ {2}} \tag {2.114}

Hence, if \lambda_1 \neq 0 ,


\rho (\mathbf {v}) = \lambda_ {1} \frac {\alpha_ {1} ^ {2} + (\lambda_ {2} / \lambda_ {1}) \alpha_ {2} ^ {2} + \cdots + (\lambda_ {n} / \lambda_ {1}) \alpha_ {n} ^ {2}}{\alpha_ {1} ^ {2} + \cdots + \alpha_ {n} ^ {2}} \tag {2.115}

and if \lambda_{n} \neq 0 , \rho(\mathbf{v}) = \lambda_{n} \frac{(\lambda_{1}/\lambda_{n}) \alpha_{1}^{2} + (\lambda_{2}/\lambda_{n}) \alpha_{2}^{2} + \cdots + \alpha_{n}^{2}}{\alpha_{1}^{2} + \cdots + \alpha_{n}^{2}} (2.116)

But since \lambda_1 \leq \lambda_2 \leq \cdots \leq \lambda_n , the relations in (2.114) to (2.116) show that (2.112) holds. Furthermore, it is seen that if \mathbf{v} = \mathbf{v}_i , we have \rho(\mathbf{v}) = \lambda_i .

Considering the practical use of the Rayleigh quotient, the following property is of particular value. Assume that v is an approximation to the eigenvector v_{i} ; i.e., say with \epsilon small, we have


\mathbf {v} = \mathbf {v} _ {i} + \epsilon \mathbf {x} \tag {2.117}

Then the Rayleigh quotient of v will give an approximation to \lambda_{i} of order \epsilon^{2} ; i.e.,


\rho (\mathbf {v}) = \lambda_ {i} + o \left(\epsilon^ {2}\right) \tag {2.118}

The notation o(\epsilon^2) means "of order \epsilon^2 " and indicates that if \delta = o(\epsilon^2) , then |\delta| \leq b\epsilon^2 , where b is a constant.

To prove this property of the Rayleigh quotient, we substitute for v from (2.113) into the Rayleigh quotient expression to obtain


\rho (\mathbf {v} _ {i} + \epsilon \mathbf {x}) = \frac {(\mathbf {v} _ {i} ^ {T} + \epsilon \mathbf {x} ^ {T}) \mathbf {A} (\mathbf {v} _ {i} + \epsilon \mathbf {x})}{(\mathbf {v} _ {i} ^ {T} + \epsilon \mathbf {x} ^ {T}) (\mathbf {v} _ {i} + \epsilon \mathbf {x})} \tag {2.119}

or \rho (\mathbf{v}_i + \epsilon \mathbf{x}) = \frac{\mathbf{v}_i^T\mathbf{A}\mathbf{v}_i + 2\epsilon\mathbf{v}_i^T\mathbf{A}\mathbf{x} + \epsilon^2\mathbf{x}^T\mathbf{A}\mathbf{x}}{\mathbf{v}_i^T\mathbf{v}_i + 2\epsilon\mathbf{x}^T\mathbf{v}_i + \epsilon^2\mathbf{x}^T\mathbf{x}} (2.120)

However, since x is an error in v_{i} , we can write


\mathbf {x} = \sum_ {\substack {j = 1 \\ j \neq i}} ^ {n} \alpha_ {j} \mathbf {v} _ {j} \tag{2.121}

But then using \mathbf{v}_i^T\mathbf{v}_j = \delta_{ij} and \mathbf{A}\mathbf{v}_j = \lambda_j\mathbf{v}_j , we have \mathbf{v}_i^T\mathbf{A}\mathbf{x} = 0 and \mathbf{x}^T\mathbf{v}_i = 0 , and hence


\rho (\mathbf {v} _ {i} + \epsilon \mathbf {x}) = \frac {\lambda_ {i} + \epsilon^ {2} \sum_ {\substack {j = 1 \\ j \neq i}} ^ {n} \alpha_ {j} ^ {2} \lambda_ {j}}{1 + \epsilon^ {2} \sum_ {\substack {j = 1 \\ j \neq i}} ^ {n} \alpha_ {j} ^ {2}} \tag{2.122}

However, using the binomial theorem to expand the denominator in (2.122), we have


\rho (\mathbf {v} _ {i} + \epsilon \mathbf {x}) = \left(\lambda_ {i} + \epsilon^ {2} \sum_ {\substack {j = 1 \\ j \neq i}} ^ {n} \alpha_ {j} ^ {2} \lambda_ {j}\right) \left[ 1 - \epsilon^ {2} \left(\sum_ {\substack {j = 1 \\ j \neq i}} ^ {n} \alpha_ {j} ^ {2}\right) + \epsilon^ {4} \left(\sum_ {\substack {j = 1 \\ j \neq i}} ^ {n} \alpha_ {j} ^ {2}\right) ^ {2} + \dots \right] \tag{2.123}

or \pmb{\rho}(\mathbf{v}_i + \pmb{\epsilon}\mathbf{x}) = \lambda_i + \pmb{\epsilon}^2\left(\sum_{\substack{j = 1 \\ j \neq i}}^n \alpha_j^2 \lambda_j - \lambda_i \sum_{\substack{j = 1 \\ j \neq i}}^n \alpha_j^2\right) + \text{higher-order terms} \tag{2.124}

The relation in (2.118) thus follows. We demonstrate the preceding results in a brief example.

EXAMPLE 2.37: Evaluate the Rayleigh quotients \rho(\mathbf{v}) for the matrix \mathbf{A} used in Example 2.29.

Using v_{1} and v_{2} in Example 2.29, consider the following cases:

  1. \mathbf{v} = \mathbf{v}_1 + 2\mathbf{v}_2;

  2. \mathbf{v} = \mathbf{v}_1 ;

  3. \mathbf{v} = \mathbf{v}_1 + 0.02\mathbf{v}_2.

In case 1, we have


\mathbf {v} = \left[ \begin{array}{c} 2 \\ - 1 \end{array} \right] + \left[ \begin{array}{c} 1 \\ 2 \end{array} \right] = \left[ \begin{array}{c} 3 \\ 1 \end{array} \right]

and thus


\rho (\mathbf {v}) = \frac {[ 3 \quad 1 ] \left[ \begin{array}{c c} - 1 & 2 \\ 2 & 2 \end{array} \right] \left[ \begin{array}{l} 3 \\ 1 \end{array} \right]}{[ 3 \quad 1 ] \left[ \begin{array}{l} 3 \\ 1 \end{array} \right]} = \frac {1}{2}

Recalling that \lambda_{1} = -2 and \lambda_{2} = 3 , we have, as expected,


\lambda_ {1} \leq \rho (\mathbf {v}) \leq \lambda_ {2}

In case 2, we have


\mathbf {v} = \left[ \begin{array}{c} 2 \\ - 1 \end{array} \right]

and hence


\rho (\mathbf {v}) = \frac {[ 2 - 1 ] \left[ \begin{array}{l l} - 1 & 2 \\ 2 & 2 \end{array} \right] \left[ \begin{array}{l} 2 \\ - 1 \end{array} \right]}{[ 2 - 1 ] \left[ \begin{array}{l} 2 \\ - 1 \end{array} \right]} = - 2

and so, as expected, \rho(\mathbf{v}) = \lambda_1 .

Finally, in case 3, we use


\mathbf {v} = \left[ \begin{array}{c} 2 \\ - 1 \end{array} \right] + \left[ \begin{array}{c} 0. 0 1 \\ 0. 0 2 \end{array} \right] = \left[ \begin{array}{c} 2. 0 1 \\ - 0. 9 8 \end{array} \right]

and hence


\rho (\mathbf {v}) = \frac {\left[ \begin{array}{l l} 2 . 0 1 & - 0 . 9 8 \end{array} \right] \left[ \begin{array}{l l} - 1 & 2 \\ 2 & 2 \end{array} \right] \left[ \begin{array}{l} 2 . 0 1 \\ - 0 . 9 8 \end{array} \right]}{\left[ \begin{array}{l l} 2 . 0 1 & - 0 . 9 8 \end{array} \right] \left[ \begin{array}{l} 2 . 0 1 \\ - 0 . 9 8 \end{array} \right]}

= - 1. 9 9 9 5 0 0 0 5

Here we note that \rho(\mathbf{v}) > \lambda_1 and that \rho(\mathbf{v}) approximates \lambda_1 more closely than \mathbf{v} approximates \mathbf{v}_1 .

Having introduced the Rayleigh quotient, we can now proceed to a very important principle, the minimax characterization of eigenvalues. We know from Rayleigh's principle that


\rho (\mathbf {v}) \geq \lambda_ {1} \tag {2.125}

where v is any vector. In other words, if we consider the problem of varying v, we will always have \rho(\mathbf{v}) \geq \lambda_{1} , and the minimum will be reached when v = v_{1} , in which case \rho(\mathbf{v}_{1}) = \lambda_{1} . Suppose that we now impose a restriction on v, namely that v be orthogonal to a specific vector w, and that we consider the problem of minimizing \rho(\mathbf{v}) subject to this restriction. After calculating the minimum of \rho(\mathbf{v}) with the condition v^{T}w = 0 , we could start varying w and for each new w evaluate a new minimum of \rho(\mathbf{v}) . We would then find that the maximum value of all the minimum values evaluated is \lambda_{2} . This result can be generalized to the following principle, called the minimax characterization of eigenvalues,


\lambda_ {r} = \max \left\{\min \frac {\mathbf {v} ^ {T} \mathbf {A} \mathbf {v}}{\mathbf {v} ^ {T} \mathbf {v}} \right\} \quad r = 1, \dots , n \tag {2.126}

with \mathbf{v} satisfying \mathbf{v}^T\mathbf{w}_i = 0 for i = 1, \ldots, r - 1, r \geq 2 . In (2.126) we choose vectors \mathbf{w}_i , i = 1, \ldots, r - 1 , and then evaluate the minimum of \rho(\mathbf{v}) with \mathbf{v} subject to the condition \mathbf{v}^T\mathbf{w}_i = 0 , i = 1, \ldots, r - 1 . After calculating this minimum we vary the vectors \mathbf{w}_i and always evaluate a new minimum. The maximum value that the minima reach is \lambda_r .

The proof of (2.126) is as follows. Let


\mathbf {v} = \sum_ {i = 1} ^ {n} \alpha_ {i} \mathbf {v} _ {i} \tag {2.127}

and evaluate the right-hand side of (2.126), which we call R,


R = \max \left\{\min \frac {\alpha_ {1} ^ {2} \lambda_ {1} + \cdots + \alpha_ {r} ^ {2} \lambda_ {r} + \alpha_ {r + 1} ^ {2} \lambda_ {r + 1} + \cdots + \alpha_ {n} ^ {2} \lambda_ {n}}{\alpha_ {1} ^ {2} + \cdots + \alpha_ {r} ^ {2} + \alpha_ {r + 1} ^ {2} + \cdots + \alpha_ {n} ^ {2}} \right\} \tag {2.128}

The coefficients \alpha_{i} must satisfy the conditions


\mathbf {w} _ {j} ^ {T} \sum_ {i = 1} ^ {n} \alpha_ {i} \mathbf {v} _ {i} = 0 \quad j = 1, \dots , r - 1 \tag {2.129}

Rewriting (2.128), we obtain


R = \max \left\{\min \left[ \begin{array}{c} \alpha_ {1} ^ {2} \left(\lambda_ {r} - \lambda_ {1}\right) + \dots + \alpha_ {r - 1} ^ {2} \left(\lambda_ {r} - \lambda_ {r - 1}\right) \\ \lambda_ {r} - \frac {+ \alpha_ {r + 1} ^ {2} \left(\lambda_ {r} - \lambda_ {r + 1}\right) + \dots + \alpha_ {n} ^ {2} \left(\lambda_ {r} - \lambda_ {n}\right)}{\alpha_ {1} ^ {2} + \dots + \alpha_ {r} ^ {2} + \alpha_ {r + 1} ^ {2} + \dots + \alpha_ {n} ^ {2}} \end{array} \right] \right\} \tag {2.130}

But we can now see that for the condition \alpha_{r+1} = \alpha_{r+2} = \cdots = \alpha_n = 0 , we have


R \leq \lambda_ {r} \tag {2.131}

and the condition in (2.129) can still be satisfied by a judicious choice for \alpha_r . On the other hand, suppose that we now choose \mathbf{w}_j = \mathbf{v}_j for j = 1, \ldots, r - 1 . This would require that \alpha_j = 0 for j = 1, \ldots, r - 1 , and consequently we would have R = \lambda_r , which completes the proof.

A most important property that can be established using the minimax characterization of eigenvalues is the eigenvalue separation property. Suppose that in addition to the