Big Red Bits

Home > theory > Cayley-Hamilton Theorem and Jordan Canonical Form

Cayley-Hamilton Theorem and Jordan Canonical Form

October 28th, 2009 renatoppl Leave a comment Go to comments

I was discussing last week with my officemates Hu Fu and Ashwin about the Cayley-Hamilton Theorem. The theorem is the following, given an ${n \times n}$ matrix ${A}$ we can define its characteristic polynomial by ${p_A(\lambda) = \det(A - I\lambda)}$ . The Cayley-Hamilton Theorem says that ${p_A(A) = 0}$ . The polynomiale is something like:

$\displaystyle p_A(x) = a_k x^k + a_{k-1} x^{k-1} + \hdots + a_1 x^1 + a_0$

so we can just see it as a formal polynomial and think of:

$\displaystyle p_A(A) = a_k A^k + a_{k-1} A^{k-1} + \hdots + a_1 A + a_0 I$

which is an ${n \times n}$ matrix. The theorem says it is the zero matrix. We thought for a while, looked in the Wikipedia, and there there were a few proofs, but not the one-line proof I was looking for. Later, I got this proof that I sent to Hu Fu:

Write the matrix ${A}$ in the basis of its eigenvectors, then we can write ${A = \Gamma^t D \Gamma}$ where ${D}$ is the diagonal matrix with the eigenvalues in the main diagonal.

$\displaystyle A^k = (\Gamma^t D \Gamma) \hdots (\Gamma^t D \Gamma) = \Gamma^t D^k \Gamma$

and since ${D = \text{diag}(\lambda_1, \hdots, \lambda_n)}$ we have ${D^k = \text{diag}(\lambda_1^k, \hdots, \lambda_n^k)}$ . Now, it is simple to see that:

$\displaystyle p_A(A) = \Gamma^t p(D) \Gamma$

and therefore:

$\displaystyle p(D) = \begin{bmatrix}& p_A(\lambda_1) & & & \\ & & p_A(\lambda_2) & & & \\ & & & \ddots & \\ & & & & p_A(\lambda_n) \end{bmatrix} = \begin{bmatrix} & 0 & & & \\ & & 0 & & & \\ & & & \ddots & \\ & & & & 0 \end{bmatrix} = 0$

And that was the one-line proof. One even simpler proof is: let ${v_i}$ be the eigenvectors, then ${p_A(A)v_i = p_A(\lambda_i)A = 0}$ , so ${p_A(A)}$ must be ${0}$ since it returns zero for all elements of a basis. Well, I sent that to Hu Fu and he told me the proof had a bug. Not really a bug, but I was proving only for symmetric matrices. More generally, I was proving for diagonalizable matrices. He showed me, for example, the matrix:

$\displaystyle \begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix}$

which has only one eigenvalue ${0}$ and the the eigenvectors are all of the form ${(x,0)}$ for ${x \in {\mathbb R}}$ . So, the dimension of the space spanned by the eigenvectors is ${1}$ , less than the dimension of the matrix. This never happens for symmetric matrices, and I guess after some time as a computer scientist, I got used to work only with symmetric matrices for almost everything I use: metrics, quadratic forms, correlation matrices, … but there is more out there then only symmetric matrices. The good news is that this proof is not hard to fix for the general case.

First, it is easy to prove that for each root of the characteristic polynomial there is one eigenvector associated to it (just see that ${\det(A - \lambda I) = 0}$ and therefore there must be ${v \in \ker(A - \lambda I) \setminus \{ 0 \}}$ , so if all the roots are distinct, then there is a basis of eigenvalues, and therefore the matrix is diagonalizable (notice that maybe we will need to use complex eigenvalues, but it is ok). The good thing is that a matrix having two identical eigenvalues is a “coincidence”. We can identify matrices with ${{\mathbb R}^{n^2}}$ . The matrices with identical eigenvalues form a zero measure subset of ${{\mathbb R}^{n^2}}$ , they are in fact the roots of a polynomial in ${x_{ij}}$ . This polynomial is the resultant polynomial ${R_{p,p'} = 0}$ . Therefore, we proved Cayley-Hamilton theorem in the complement of a zero-measure set in ${{\mathbb R}^{n^2}}$ . Since ${A \mapsto p_A(A)}$ is a continuous function, it extends naturally to all matrices ${A \in {\mathbb R}^{n^2}}$ .

We can also interpret that probabilistically: get a matrix ${U}$ where ${U_{ij}}$ is taken uniformly at random from ${[0,1]}$ . Then ${A + \epsilon U}$ has with probability ${1}$ all different eigenvalues. So, ${p_{A+\epsilon U} (A+\epsilon U) = 0}$ with probability ${1}$ . Now, just make ${\epsilon \rightarrow 0}$ .

Ok, this proves the Theorem for real and complex matrices, but what about a matrix defined over a general field where we can’t use those continuity arguments. A way to get around it is by using Jordan Canonical Form, which is a generalization of eigenvector decomposition. Not all matrices have eigenvector decomposition, but all matrices over an algebraic closed field can be written in Jordan Canonical Form. Given any ${A \in \overline{K}^{n^2}}$ there is a matrix ${\Gamma \in \overline{K}^{n^2}}$ so that:

$\displaystyle A = \Gamma^{-1} \begin{bmatrix}& B_1 & & & \\ & & B_2 & & & \\ & & & \ddots & \\ & & & & B_p \end{bmatrix}\Gamma$

where ${B_i}$ are blocks of the form:

$\displaystyle B_i = \begin{bmatrix}& \lambda & 1 & & \\ & & \lambda & 1 & & \\ & & & \ddots & \\ & & & & \lambda \end{bmatrix}$

By the same argument as above, we just need to prove Cayley Hamilton for each block in separate. So we need to prove that ${p_A(B_i) = 0}$ . If the block has size ${1}$ , then it is exacly the proof above. If the block is bigger, then we need to look at how does ${B_i^k}$ looks like. By inspection:

$\displaystyle B_i^2 = \begin{bmatrix}& \lambda^2 & 2 \lambda & 1 & & & \\ & & \lambda^2 & 2 \lambda & 1 & & & \\ & & & & & \ddots & \\ & & & & & \lambda^2 \end{bmatrix}$

Tipically, for ${B_i^k}$ we have in each row, starting in column ${k}$ the sequence ${\lambda^k, k \lambda^{k-1}, k(k-1) \lambda^{k-1}, \hdots}$ , i.e., ${\frac{d^0}{d\lambda^0} \lambda^k, \frac{d^1}{d\lambda^1} \lambda^k , \frac{d^2}{d\lambda^2} \lambda^k \hdots}$ . So, we have

$\displaystyle p(B_i) = \begin{bmatrix} p(\lambda) & p'(\lambda) & p''(\lambda) & p'''(\lambda) & \hdots \\ & p(\lambda) & p'(\lambda) & p''(\lambda) & \hdots \\ & & p(\lambda) & p'(\lambda) & \hdots \\ & & & p(\lambda) & \hdots \\ & & & & \ddots \\ \end{bmatrix}$

If block ${B_i}$ has size ${k}$ , then ${\lambda_i}$ has multiplicity ${k}$ in ${p(.)}$ and therefore ${p(\lambda_i) = p'(\lambda_i) = \hdots = p^{(k-1)}(\lambda_i)}$ and therefore, ${p(B_i) = 0}$ as we wanted to prove.

It turned out not to be a very very short proof, but it is still short, since it uses mostly elementary stuff and the proof is really intuitive in some sense. I took some lessons from that: (i) first it reinforces my idea that, if I need to say something about a matrix, the first thing I do is to look at its eigenvectors decomposition. A lot of Linear Algebra problems are very simple when we consider things in the right basis. Normally the right basis is the eigenvector basis. (ii) not all matrices are diagonalizable. But in those cases, Jordan Canonical Form comes in our help and we can do almost the same as we did with eigenvalue decomposition.

Categories: theory Tags: finite fields, linear algebra, theory

Comments (4) Trackbacks (0) Leave a comment Trackback

disnu

August 4th, 2018 at 13:17 | #1

Reply | Quote

in this case are more possibilities
ceanguiu

August 4th, 2018 at 13:19 | #2

Reply | Quote

sure we see as Mir . Orasan can be considered some aspects of Jordan theorems
dunciu

August 4th, 2018 at 22:51 | #3

Reply | Quote

here we mention that the above is applicable for other situations as Mir. Oras as in case of parametric functions
stascu

August 5th, 2018 at 00:08 | #4

Reply | Quote

sure here these exposed are more and more as Mircea Orasan as and followed as example application of Jordan theorem in case of multiple integrals and associate method specially in cases of curvilinear coordinates it would be useful to have the rule of partial integration and a substitution rule for line integrals in this context. Well, these rules hold true and are not difficult to prove, but for some reason they are not treated in the textbooks we used. I state them here for convenience.