Home > theory > Cayley-Hamilton Theorem and Jordan Canonical Form

Cayley-Hamilton Theorem and Jordan Canonical Form

I was discussing last week with my officemates Hu Fu and Ashwin about the Cayley-Hamilton Theorem. The theorem is the following, given an {n \times n}&fg=000000 matrix {A}&fg=000000 we can define its characteristic polynomial by {p_A(\lambda) = \det(A - I\lambda)}&fg=000000. The Cayley-Hamilton Theorem says that {p_A(A) = 0}&fg=000000. The polynomiale is something like:

\displaystyle p_A(x) = a_k x^k + a_{k-1} x^{k-1} + \hdots + a_1 x^1 + a_0&fg=000000

so we can just see it as a formal polynomial and think of:

\displaystyle p_A(A) = a_k A^k + a_{k-1} A^{k-1} + \hdots + a_1 A + a_0 I&fg=000000

which is an {n \times n}&fg=000000 matrix. The theorem says it is the zero matrix. We thought for a while, looked in the Wikipedia, and there there were a few proofs, but not the one-line proof I was looking for. Later, I got this proof that I sent to Hu Fu:

Write the matrix {A}&fg=000000 in the basis of its eigenvectors, then we can write {A = \Gamma^t D \Gamma}&fg=000000 where {D}&fg=000000 is the diagonal matrix with the eigenvalues in the main diagonal.

\displaystyle A^k = (\Gamma^t D \Gamma) \hdots (\Gamma^t D \Gamma) = \Gamma^t D^k \Gamma&fg=000000

and since {D = \text{diag}(\lambda_1, \hdots, \lambda_n)}&fg=000000 we have {D^k = \text{diag}(\lambda_1^k, \hdots, \lambda_n^k)}&fg=000000. Now, it is simple to see that:

\displaystyle p_A(A) = \Gamma^t p(D) \Gamma&fg=000000

and therefore:

\displaystyle p(D) = \begin{bmatrix}& p_A(\lambda_1) & & & \\ & & p_A(\lambda_2) & & & \\ & & & \ddots & \\ & & & & p_A(\lambda_n) \end{bmatrix} = \begin{bmatrix} & 0 & & & \\ & & 0 & & & \\ & & & \ddots & \\ & & & & 0 \end{bmatrix} = 0&fg=000000

And that was the one-line proof. One even simpler proof is: let {v_i}&fg=000000 be the eigenvectors, then {p_A(A)v_i = p_A(\lambda_i)A = 0}&fg=000000, so {p_A(A)}&fg=000000 must be {0}&fg=000000 since it returns zero for all elements of a basis. Well, I sent that to Hu Fu and he told me the proof had a bug. Not really a bug, but I was proving only for symmetric matrices. More generally, I was proving for diagonalizable matrices. He showed me, for example, the matrix:

\displaystyle  \begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix}&fg=000000

which has only one eigenvalue {0}&fg=000000 and the the eigenvectors are all of the form {(x,0)}&fg=000000 for {x \in {\mathbb R}}&fg=000000. So, the dimension of the space spanned by the eigenvectors is {1}&fg=000000, less than the dimension of the matrix. This never happens for symmetric matrices, and I guess after some time as a computer scientist, I got used to work only with symmetric matrices for almost everything I use: metrics, quadratic forms, correlation matrices, … but there is more out there then only symmetric matrices. The good news is that this proof is not hard to fix for the general case.

First, it is easy to prove that for each root of the characteristic polynomial there is one eigenvector associated to it (just see that {\det(A - \lambda I) = 0}&fg=000000 and therefore there must be {v \in \ker(A - \lambda I) \setminus \{ 0 \}}&fg=000000, so if all the roots are distinct, then there is a basis of eigenvalues, and therefore the matrix is diagonalizable (notice that maybe we will need to use complex eigenvalues, but it is ok). The good thing is that a matrix having two identical eigenvalues is a “coincidence”. We can identify matrices with {{\mathbb R}^{n^2}}&fg=000000. The matrices with identical eigenvalues form a zero measure subset of {{\mathbb R}^{n^2}}&fg=000000, they are in fact the roots of a polynomial in {x_{ij}}&fg=000000. This polynomial is the resultant polynomial {R_{p,p'} = 0}&fg=000000. Therefore, we proved Cayley-Hamilton theorem in the complement of a zero-measure set in {{\mathbb R}^{n^2}}&fg=000000. Since {A \mapsto p_A(A)}&fg=000000 is a continuous function, it extends naturally to all matrices {A \in {\mathbb R}^{n^2}}&fg=000000.

We can also interpret that probabilistically: get a matrix {U}&fg=000000 where {U_{ij}}&fg=000000 is taken uniformly at random from {[0,1]}&fg=000000. Then {A + \epsilon U}&fg=000000 has with probability {1}&fg=000000 all different eigenvalues. So, {p_{A+\epsilon U} (A+\epsilon U) = 0}&fg=000000 with probability {1}&fg=000000. Now, just make {\epsilon \rightarrow 0}&fg=000000.

Ok, this proves the Theorem for real and complex matrices, but what about a matrix defined over a general field where we can’t use those continuity arguments. A way to get around it is by using Jordan Canonical Form, which is a generalization of eigenvector decomposition. Not all matrices have eigenvector decomposition, but all matrices over an algebraic closed field can be written in Jordan Canonical Form. Given any {A \in \overline{K}^{n^2}}&fg=000000 there is a matrix {\Gamma \in \overline{K}^{n^2}}&fg=000000 so that:

\displaystyle A = \Gamma^{-1} \begin{bmatrix}& B_1 & & & \\ & & B_2 & & & \\ & & & \ddots & \\ & & & & B_p \end{bmatrix}\Gamma&fg=000000

where {B_i}&fg=000000 are blocks of the form:

\displaystyle B_i = \begin{bmatrix}& \lambda & 1 & & \\ & & \lambda & 1 & & \\ & & & \ddots & \\ & & & & \lambda \end{bmatrix}&fg=000000

By the same argument as above, we just need to prove Cayley Hamilton for each block in separate. So we need to prove that {p_A(B_i) = 0}&fg=000000. If the block has size {1}&fg=000000, then it is exacly the proof above. If the block is bigger, then we need to look at how does {B_i^k}&fg=000000 looks like. By inspection:

\displaystyle B_i^2 = \begin{bmatrix}& \lambda^2 & 2 \lambda & 1 & & & \\ & & \lambda^2 & 2 \lambda & 1 & & & \\ & & & & & \ddots & \\ & & & & & \lambda^2 \end{bmatrix}&fg=000000

Tipically, for {B_i^k}&fg=000000 we have in each row, starting in column {k}&fg=000000 the sequence {\lambda^k, k \lambda^{k-1}, k(k-1) \lambda^{k-1}, \hdots}&fg=000000, i.e., {\frac{d^0}{d\lambda^0} \lambda^k, \frac{d^1}{d\lambda^1} \lambda^k , \frac{d^2}{d\lambda^2} \lambda^k \hdots}&fg=000000. So, we have

\displaystyle p(B_i) = \begin{bmatrix} p(\lambda) & p'(\lambda) & p''(\lambda) & p'''(\lambda) & \hdots \\ & p(\lambda) & p'(\lambda) & p''(\lambda) & \hdots \\ 				& & p(\lambda) & p'(\lambda) & \hdots \\ 				& & & p(\lambda) & \hdots \\ 				& & & & \ddots \\ \end{bmatrix}&fg=000000

If block {B_i}&fg=000000 has size {k}&fg=000000, then {\lambda_i}&fg=000000 has multiplicity {k}&fg=000000 in {p(.)}&fg=000000 and therefore {p(\lambda_i) = p'(\lambda_i) = \hdots = p^{(k-1)}(\lambda_i)}&fg=000000 and therefore, {p(B_i) = 0}&fg=000000 as we wanted to prove.

It turned out not to be a very very short proof, but it is still short, since it uses mostly elementary stuff and the proof is really intuitive in some sense. I took some lessons from that: (i) first it reinforces my idea that, if I need to say something about a matrix, the first thing I do is to look at its eigenvectors decomposition. A lot of Linear Algebra problems are very simple when we consider things in the right basis. Normally the right basis is the eigenvector basis. (ii) not all matrices are diagonalizable. But in those cases, Jordan Canonical Form comes in our help and we can do almost the same as we did with eigenvalue decomposition.

  1. Jorge Miranda
    February 19th, 2010 at 13:56 | #1

    I think it’s even easier than that using the Jordan Canonical Form.

    The idea is that $p_A(A)=(A-\lambda_1I)…(A-\lambda_nI)$. If we compute $p_A(A)v$ for any $v$ that is a generalized eigenvector $p_A(A)v=0$ because we have enough factors in $p_A(A)$ to use that $(A-\lambda_iI)^kv=0$.

    Since the generalized eigenvectors are a basis, we conclude that $p_A(A)v=0$ for any $v$, and that means that $p_A(A)=0$ as we wanted to prove.

    Still, neat proof :D

  2. February 19th, 2010 at 14:06 | #2

    Thanks for the comment. I think it is actually the same proof, but you are hiding the Jordan Canonical Form in the concept of generalized eigenvectors.

  1. No trackbacks yet.