Home > theory > Cayley-Hamilton Theorem and Jordan Canonical Form

## Cayley-Hamilton Theorem and Jordan Canonical Form

I was discussing last week with my officemates Hu Fu and Ashwin about the Cayley-Hamilton Theorem. The theorem is the following, given an ${n \times n}$ matrix ${A}$ we can define its characteristic polynomial by ${p_A(\lambda) = \det(A - I\lambda)}$. The Cayley-Hamilton Theorem says that ${p_A(A) = 0}$. The polynomiale is something like:

$\displaystyle p_A(x) = a_k x^k + a_{k-1} x^{k-1} + \hdots + a_1 x^1 + a_0$

so we can just see it as a formal polynomial and think of:

$\displaystyle p_A(A) = a_k A^k + a_{k-1} A^{k-1} + \hdots + a_1 A + a_0 I$

which is an ${n \times n}$ matrix. The theorem says it is the zero matrix. We thought for a while, looked in the Wikipedia, and there there were a few proofs, but not the one-line proof I was looking for. Later, I got this proof that I sent to Hu Fu:

Write the matrix ${A}$ in the basis of its eigenvectors, then we can write ${A = \Gamma^t D \Gamma}$ where ${D}$ is the diagonal matrix with the eigenvalues in the main diagonal.

$\displaystyle A^k = (\Gamma^t D \Gamma) \hdots (\Gamma^t D \Gamma) = \Gamma^t D^k \Gamma$

and since ${D = \text{diag}(\lambda_1, \hdots, \lambda_n)}$ we have ${D^k = \text{diag}(\lambda_1^k, \hdots, \lambda_n^k)}$. Now, it is simple to see that:

$\displaystyle p_A(A) = \Gamma^t p(D) \Gamma$

and therefore:

$\displaystyle p(D) = \begin{bmatrix}& p_A(\lambda_1) & & & \\ & & p_A(\lambda_2) & & & \\ & & & \ddots & \\ & & & & p_A(\lambda_n) \end{bmatrix} = \begin{bmatrix} & 0 & & & \\ & & 0 & & & \\ & & & \ddots & \\ & & & & 0 \end{bmatrix} = 0$

And that was the one-line proof. One even simpler proof is: let ${v_i}$ be the eigenvectors, then ${p_A(A)v_i = p_A(\lambda_i)A = 0}$, so ${p_A(A)}$ must be ${0}$ since it returns zero for all elements of a basis. Well, I sent that to Hu Fu and he told me the proof had a bug. Not really a bug, but I was proving only for symmetric matrices. More generally, I was proving for diagonalizable matrices. He showed me, for example, the matrix:

$\displaystyle \begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix}$

which has only one eigenvalue ${0}$ and the the eigenvectors are all of the form ${(x,0)}$ for ${x \in {\mathbb R}}$. So, the dimension of the space spanned by the eigenvectors is ${1}$, less than the dimension of the matrix. This never happens for symmetric matrices, and I guess after some time as a computer scientist, I got used to work only with symmetric matrices for almost everything I use: metrics, quadratic forms, correlation matrices, … but there is more out there then only symmetric matrices. The good news is that this proof is not hard to fix for the general case.

First, it is easy to prove that for each root of the characteristic polynomial there is one eigenvector associated to it (just see that ${\det(A - \lambda I) = 0}$ and therefore there must be ${v \in \ker(A - \lambda I) \setminus \{ 0 \}}$, so if all the roots are distinct, then there is a basis of eigenvalues, and therefore the matrix is diagonalizable (notice that maybe we will need to use complex eigenvalues, but it is ok). The good thing is that a matrix having two identical eigenvalues is a “coincidence”. We can identify matrices with ${{\mathbb R}^{n^2}}$. The matrices with identical eigenvalues form a zero measure subset of ${{\mathbb R}^{n^2}}$, they are in fact the roots of a polynomial in ${x_{ij}}$. This polynomial is the resultant polynomial ${R_{p,p'} = 0}$. Therefore, we proved Cayley-Hamilton theorem in the complement of a zero-measure set in ${{\mathbb R}^{n^2}}$. Since ${A \mapsto p_A(A)}$ is a continuous function, it extends naturally to all matrices ${A \in {\mathbb R}^{n^2}}$.

We can also interpret that probabilistically: get a matrix ${U}$ where ${U_{ij}}$ is taken uniformly at random from ${[0,1]}$. Then ${A + \epsilon U}$ has with probability ${1}$ all different eigenvalues. So, ${p_{A+\epsilon U} (A+\epsilon U) = 0}$ with probability ${1}$. Now, just make ${\epsilon \rightarrow 0}$.

Ok, this proves the Theorem for real and complex matrices, but what about a matrix defined over a general field where we can’t use those continuity arguments. A way to get around it is by using Jordan Canonical Form, which is a generalization of eigenvector decomposition. Not all matrices have eigenvector decomposition, but all matrices over an algebraic closed field can be written in Jordan Canonical Form. Given any ${A \in \overline{K}^{n^2}}$ there is a matrix ${\Gamma \in \overline{K}^{n^2}}$ so that:

$\displaystyle A = \Gamma^{-1} \begin{bmatrix}& B_1 & & & \\ & & B_2 & & & \\ & & & \ddots & \\ & & & & B_p \end{bmatrix}\Gamma$

where ${B_i}$ are blocks of the form:

$\displaystyle B_i = \begin{bmatrix}& \lambda & 1 & & \\ & & \lambda & 1 & & \\ & & & \ddots & \\ & & & & \lambda \end{bmatrix}$

By the same argument as above, we just need to prove Cayley Hamilton for each block in separate. So we need to prove that ${p_A(B_i) = 0}$. If the block has size ${1}$, then it is exacly the proof above. If the block is bigger, then we need to look at how does ${B_i^k}$ looks like. By inspection:

$\displaystyle B_i^2 = \begin{bmatrix}& \lambda^2 & 2 \lambda & 1 & & & \\ & & \lambda^2 & 2 \lambda & 1 & & & \\ & & & & & \ddots & \\ & & & & & \lambda^2 \end{bmatrix}$

Tipically, for ${B_i^k}$ we have in each row, starting in column ${k}$ the sequence ${\lambda^k, k \lambda^{k-1}, k(k-1) \lambda^{k-1}, \hdots}$, i.e., ${\frac{d^0}{d\lambda^0} \lambda^k, \frac{d^1}{d\lambda^1} \lambda^k , \frac{d^2}{d\lambda^2} \lambda^k \hdots}$. So, we have

$\displaystyle p(B_i) = \begin{bmatrix} p(\lambda) & p'(\lambda) & p''(\lambda) & p'''(\lambda) & \hdots \\ & p(\lambda) & p'(\lambda) & p''(\lambda) & \hdots \\ & & p(\lambda) & p'(\lambda) & \hdots \\ & & & p(\lambda) & \hdots \\ & & & & \ddots \\ \end{bmatrix}$

If block ${B_i}$ has size ${k}$, then ${\lambda_i}$ has multiplicity ${k}$ in ${p(.)}$ and therefore ${p(\lambda_i) = p'(\lambda_i) = \hdots = p^{(k-1)}(\lambda_i)}$ and therefore, ${p(B_i) = 0}$ as we wanted to prove.

It turned out not to be a very very short proof, but it is still short, since it uses mostly elementary stuff and the proof is really intuitive in some sense. I took some lessons from that: (i) first it reinforces my idea that, if I need to say something about a matrix, the first thing I do is to look at its eigenvectors decomposition. A lot of Linear Algebra problems are very simple when we consider things in the right basis. Normally the right basis is the eigenvector basis. (ii) not all matrices are diagonalizable. But in those cases, Jordan Canonical Form comes in our help and we can do almost the same as we did with eigenvalue decomposition.

Categories: theory Tags:
1. No comments yet.
1. No trackbacks yet.