Tag Archives: linear algebra

Fun with reflections and determinants

in collaboration with guest contributor Gabriel Coutinho

My frequent collaborator Gabriel Coutinho and I have very different problem solving styles.  In his own words, “we could write a book together; your part will have fancy group theory and nice drawings and mine will have sloppy proofs of eigenvalue bounds.” We challenged each other to this problem, which is B3 from the 2019 Putnam Math Competition. There was struggle and despair, but the next morning, we came up with two very different solutions.

The Problem:

Let \(Q\) be an \(n\times n\) real orthogonal matrix and let \(u\) be a \(n\)-dimensional real (column) unit vector. Let \(P = I – 2u u^T\), where \(I\) is the \(n\times n\) identity matrix. Show that if \(1\) is not an eigenvalue of \(Q\), then \(1\) is an eigenvalue of \(PQ\). 

Note that \(P\) is a reflection; it is an involution and also called a Householder transformation

Krystal’s solution: what exactly does \(P\) do? 

We can easily see that \(P\) sends \(u\) to \(-u\) and fixes everything orthogonal to \(u\). But we can actually prescribe the behaviour of a reflection.

Lemma. If \(x\) and \(y\) are real unit vectors and $$P = I – 2\frac{(x-y)(x-y)^T}{\langle x-y, x-y \rangle},$$ then \(Px = y\).

Proof. We observe the following \[ \begin{split} Px &= x – 2\frac{(x-y)(x-y)^Tx}{\langle x-y, x-y \rangle} \\ &= x – 2\frac{(x-y)(x^Tx-y^Tx)}{x^Tx- x^Ty -y^Tx+ y^Ty} \\ &= x – 2\frac{(x-y)(1-y^Tx)}{2(1 – y^Tx)} \\ &= x – (x-y) \\ &= y, \end{split} \] as claimed. ∎

Corollary. Suppose \(u\) is a unit vector such that \( u = x – y \) where \( |x| = |y| =\alpha \). If \(P = I – 2u u^T\), then \(P \left( \frac{x}{\alpha} \right) = \frac{y}{\alpha} \).

Proof. We observe that \(\frac{x}{\alpha}\) and \(\frac{y}{\alpha}\) are unit vectors. The corollary now follows from the Lemma. ∎

To solve the problem, we observe that \(Q\) and \(Q^T\) have the same eigenvalues and so, if \(1\) is not an eigenvalue of \(Q\), there exists a matrix \(M = (I-Q^T)^{-1}\). Let \(x = Mu\). We see that \[u = (I-Q^T)x = x – Q^Tx.\] Since \(Q^T\) is an orthogonal matrix and thus preserves lengths, we have that \(|x| = |Q^Tx|\). Let \(x’ = \frac{x}{|x|}\). We may now apply the corollary and conclude that \(P\) takes \(x’ \) to \(Q^Tx’\).  

Now we see that \[PQ (Q^Tx’) = Px’ = Q^Tx’\] and so \(Q^Tx’\) is an eigenvector for \(PQ\) with eigenvalue \(1\).  

Gabriel’s solution: let’s go with some determinants  

Let \(R\) any real orthogonal matrix. Its determinant is \(\pm 1\) and all of its eigenvalues have norm 1. The \(P\) a reflection about an \((n-1)\)-dimensional subspace; it has one eigenvalue equal to \(-1\) and all other eigenvalues are \(1\) and so \(\det P = -1\). The idea is to show if \(\det(Q) = (-1)^{n-1}\), then \(Q\) has \(1\) as an eigenvalue.  

\(n\) is odd. Suppose \(R\) is an \(n\times n\) orthogonal matrix and \(\det(R) = 1\). Then \[ \det(I-R) = \det(R^T) \det(I-R) = \det(R^T – I) = (-1)^n \det(I-R). \] Since \(n\) is odd, we get that \(\det(I-R) = 0\), and \(1\) is an eigenvalue. Thus if \(Q\) has odd order, it must be that \(\det Q = -1\). As \(\det P = -1\), we have \(\det (PQ) = 1\) and the result follows. 

\(n\) is even. Suppose \(\det(R) = -1\). Each complex eigenvalue comes with its complex conjugate, so there are an even number of complex eigenvalues and their product is \(1\). Then there must an odd number of eigenvalues equal to \(-1\). But then there is also an eigenvalue equal to \(1\), as there are an even number of eigenvalues in total. Thus if \(Q\) and \(P\) have even order, it must be that \(\det Q = 1\). As \(\det P = -1\), we have \(\det (PQ) = -1\) and the result follows.

Addendum

Gabriel later realized that both parts of his proof could be extended to the other parity and leaves it as an exercise to the avid reader.

Why are Hermitian matrices diagonalizable?

“Since we are working with a Hermitian matrix, we may take an eigenbasis of the space …”

“Wait, sorry, why are Hermitian matrices diagonalizable, again?”

“Umm … it’s not quick to explain.”

This exchange happens often when I give talks about spectra of graphs and digraphs in Bojan’s graph theory meeting. I know several ways to prove that Hermitian matrices are diagonalizable, but I couldn’t think of a simple, succinct statement which gives the right intuitive thing. So, in this post, I’d like to arrive at such a one-liner.

Definitions

We consider $n\times n$ matrices with entries in $\mathbb{C}$. A matrix $H$ is Hermitian if $H^* = H$, where $H^* = \bar{H}^T$ is the conjugate transpose of $H$. A matrix $H$ is diagonalizable if $H$ is similar to a diagonal matrix: i.e. there exist invertible matrix $P$ such that $P^{-1} H P = D$ where $D$ is a diagonal matrix.

An eigenvalue of $H$ is $\lambda$ such that there exist $v \in \mathbb{C}^n$ such that $$Hv = \lambda v$$ and $v$ is said to be an eigenvector of $H$. The characteristic polynomial of $H$ is $$\phi(H,t) = \det(tI – H).$$ If $\lambda$ is a number such that $\phi(H,\lambda) = 0$, then $\lambda I – H$ has a non-trivial kernel and so there exists a vector $v$ such that $(\lambda I – H)v = 0$. Thus, every root of $\phi(H,t)$ is an eigenvalue of $H$. However, the roots of the characteristic polynomial are not the same as the multiset of eigenvalues because there is a question of multiplicity.

Geometric vs algebraic multiplicities

The geometric multiplicity, $m_g(\lambda)$, of an eigenvalue $\lambda$ of $H$ is the dimension of the subspace of $\mathbb{C}^n$ generated by all eigenvectors $H$ with eigenvalue $\lambda$ (this space is called the eigenspace of $\lambda$). The algebraic multiplicity, $m_a(\lambda)$, is the multiplicity of $\lambda$ as a root of $\phi(H,t)$.

Proposition. A Hermitian matrix $H$ is diagonalizable if and only if $m_a(\lambda) = m_g(\lambda)$ for each eigenvalue $\lambda$ of $H$.

Proof. Suppose $H$ is a $n\times n$ Hermitian matrix.

“$\Leftarrow$” It is easy to see that the characteristic polynomial has degree $n$ and hence $n$ roots. Since the algebraic and geometric multiplicities agree, we see that $H$ has $n$ orthonormal eigenvectors (details left as an exercise) which can be used to form the columns of a matrix $P$. Then $P^*P = I $ and $P^*HP$ is a diagonal matrix with the eigenvalues of $H$ on the diagonal.

“$\Rightarrow$” Let $P$ be a matrix such that $P^{-1}HP = D$, where $D$ is a diagonal matrix. Then $$HP =PD$$ and we may consider the $j$th column of both sides. On the left side, we get $Hv$ where $v$ is the $j$th column of $P$. On the right side, we get $D_{jj}v$, and so $v$ is an eigenvector of $H$ with eigenvalue $D_{jj}$. Since $H$ and $D$ are similar, $$\phi(H,t) = \phi(D,t) = \prod_{j = 1}^n (t-D_{jj}). $$ Thus the multiset $\{D_{jj}\}_{j=1}^n$ is the set of eigenvalues of $H$ with both geometric and with algebraic  multiplicities. $\Box$

Diagonalizability

We will now show that Hermitian matrices are diagonalizable by showing that every eigenvalue has the same algebraic and geometric multiplicities.

Theorem. If $H$ is a Hermitian matrix with eigenvalue $\lambda$, then $m_g(\lambda) = m_a(\lambda)$.

Proof. We take $H$ to be a $n\times n$ Hermitian matrix and $\lambda$ an eigenvalue of $H$. We proceed by induction on $n$.

If $m_a(\lambda) = 1$, we are done, since there must be exactly one eigenvector of $\lambda$. We may assume $a = m_a(\lambda) >1$. Let $x$ be an eigenvector of $H$ such that $Hx = \lambda x$. We may assume that $x$ is normalized, i.e. $x^*x = 1$. We may extend $\{x\}$ to an orthonormal basis of $\mathbb{C}^n$, say $\{x, v_2, \ldots, v_n\}$.

Let $V= \langle v_2, \ldots, v_n \rangle = \langle x \rangle ^{\perp}$. We may consider $\mathbb{C}^n$ as the direct sum $\langle x \rangle \oplus V$. Let $v \in V$. Observe that $$x^*(Hv) = x^*H^*v=  (Hx)^*v = (\lambda x)^*v = \bar{\lambda} x^*v = 0.$$ Thus $Hv \in V$ and $V$ is a $H$-invariant subspace of $\mathbb{C}^n$.

Let $P$ be the unitary matrix with $\{x, v_2, \ldots, v_n\}$ as its columns. The above gives that $$ P^* HP = \left( \begin{array}{cccc} \lambda & 0 & \cdots & 0 \\ 0 & & & \\ \vdots & & B &  \\0 &&& \end{array} \right)$$ and $B$ is Hermitian since the left side is. We see that $$\phi(H,t) = \phi(P^* HP, t) = (t-\lambda)\phi(B,t).$$ Thus, $\lambda$ is an eigenvalue of $B$ with algebraic and geometric multiplicity  $a-1$ (by induction) and $B$ has pair-wise orthogonal eigenvectors $x’_2, \ldots, x’_a$ of $\lambda$. For $j = 2, \ldots, a$, let $x_j = P\left( \begin{array}{c} 0 \\ x’_j \end{array} \right)  P^*$. It is easy to see that $x, x_2, \ldots, x_a$ are pair-wise orthogonal eigenvectors of $H$ with eigenvalue $\lambda$, which proves the theorem. $\Box$

The Take-away

There are many (mostly equivalent ways) to show this; we could have used induction to prove $H$ is diagonalizable, without consider geometric vs algebraic multiplicities, we could have proved the decomposition into Jordan blocks, or we could have proven the spectral decomposition theorem.

The crux of the proof is that, when $H$ is Hermitian, the vector space $W^{\perp}$ is $H$-invariant when $W$ is. In our proof, this allowed us to, colloquially speaking, keep pulling eigenvectors out of $\mathbb{C}^n$. In general, given an $H$-invariant subspace $W$ of $\mathbb{C}^n$, we can consider the action of $H$ (by multiplication on the left) on $W$ and find the minimal polynomial of $H$ over $W$. If $\psi_1$ and $\psi_2$ are the minimal polynomials of $H$ over $W$ and $W^{\perp}$, respectively, then $\phi(H,t) = \psi_1 \psi_2$.


Intuitively, a Hermitian matrix $H$ is diagonalizable because we can break $\mathbb{C}^n$ into $H$-invariant, pairwise orthogonal, subspaces and diagonalize $H$ over each subspace.


To see that this a property that is not true of all matrices, consider the following matrix: $$N = \left(\begin{array}{cccc} 0 & 1 & 1 & 1 \\ 0& 0& 1 & 1 \\ 0& 0 & 0 & 1 \\ 0&0&0&0 \end{array} \right).$$ For graph theorists, this is the adjacency matrix of a transitive tournament on $4$ vertices. It is also a nilpotent matrix; that is, $N^4 = 0$. The characteristic polynomial of $N$ is $t^4$ and $N$ has $0$ as an eigenvalue with algebraic multiplicity $4$. However, $N$ has only one linearly independent eigenvector $e_1$ (the elementary basis vector, $(1\,\, 0\,\, 0\,\, 0)^T$).

Here, $\langle e_1 \rangle$ is an eigenspace of $N$ and hence $N$-invariant. Observe that $e_2 \in \langle e_1 \rangle^{\perp}$ but $Ne_2 = e_1$ and so $ \langle e_1 \rangle^{\perp}$ is certainly not $N$-invariant.

References

Roman, Steven. Advanced Linear Algebra. (Springer Graduate Texts in Mathematics, Vol. 135)

Prasolov, V. V. Problems and Theorems in Linear Algebra. (AMS Translations of Mathematical Monographs, Vol. 134)