The many uses of singular vectors
Important notes
- This is the text on the left side of the margin.
Questions
This is the second thing I wish to write. It will be a boring paragraph.
- The third thing I wish to write is this. Which came before the previous thing.
Linear transformations
- introduce simple intuition about linear transformations
- elaborate on their general proprties
- this post will discuss geometric properties of linear transformations and then some of the applications which we can use a geometric understanding of linear transformations to solve.
The determinant
Imagine you have a little box, a unit cube, sitting at the origin of your coordinate system. It has a volume of exactly one. Now, apply your linear transformation, your matrix, to every point in space. What happens to our box? It gets distorted! It might be stretched, squashed, or sheared into a parallelepiped. But here is the key question: how much “stuff” fits inside this new shape?
The determinant is simply the answer to that question. It is the factor by which the volume of our little box is scaled. If the determinant is 2, your box has doubled in volume. If it is 0, the box has been squashed flat—all the volume is gone, as is a dimension.
This connects beautifully with eigenvalues. If you align yourself with the “natural” axes of the transformation (the eigenvectors), the matrix acts purely by stretching or shrinking along those lines. The eigenvalues are just the stretching factors. So, if you stretch by $\lambda_1$ in one direction and $\lambda_2$ in another, the total volume change is just their product $\lambda_1 \lambda_2 \dots \lambda_n$. That is why the determinant is the product of the eigenvalues—it’s just measuring the total expansion of space.
Now, the question remains. How do we know what the “natural” axes of the transformation are? We look for the directions that resist the urge to rotate. In the general shuffle of a linear transformation, almost every vector gets knocked off its original path and pointed somewhere new. But there are a few special, stubborn vectors that stay on their own line. They might get longer, they might get shorter, but they don’t turn. These are our eigenvectors, and they are the skeleton key that unlocks the geometry of the entire transformation. We have described these eigenvectors as vectors whose direction does not change. Therefore, they must be the vectors that are unchanged by the linear transformation up to a scalar multiple. We can write down an equation formalising this intuition; $Av = \lambda v$ for some scalar $\lambda$ which we will call the eigenvalue corresponding to the eigenvector $v$.
Eigenvectors, we have established, help us understand the geometry of a linear transformation by giving us an invariant direction corresponding to each dimension of the transformation. Eigenvalues tell us the amount by which the transformation stretches or shrinks along those directions. What if your transformation rotates everything? Then nothing stays on its line. Or worse, what if your matrix isn’t even square? What if it takes you from a 3D world to a 2D sheet? You can’t have an ‘eigenvector’ that goes in as a 3D arrow and comes out as a 2D arrow pointing in the ‘same’ direction. The dimensions do not match!
Enter the singular values. If I have a vector $x$ of length 1—a unit vector pointing in some direction—how long is the new vector $Ax$? We want to find the direction $x$ that maximizes this length. We want to find the direction of maximum stretch.
Mathematically, we want to maximize the squared length $|Ax|^2$. Why squared? Because it’s easier to work with products than square roots. We can write this length as the dot product of the vector with itself: \(\|Ax\|^2 = (Ax) \cdot (Ax) = (Ax)^T (Ax)\) Using the rules of matrix transposes, this becomes: \(x^T A^T A x\) Look at that new beast in the middle, $S = A^T A$. This matrix $S$ is special. It’s symmetric. It’s square. And essentially, it measures how much $A$ stretches things, regardless of where $A$ actually sends them. We could think of it as the metric tensor of the transformation - it encodes all the geometry of the transformation. For any two vectors $u$ and $v$, the dot product after transformation is $(Au) \cdot (Av) = u^T A^T A v = u^T S v$. The matrix $S$ captures this.
Now, we are back on familiar ground. The problem of maximizing $x^T S x$ for a unit vector $x$ is a famous one. We can solve it using Lagrange multipliers; maximise $x^T S x$ subject to $x^T x = 1$. Taking derivatives gives us $\nabla_x (x^T S x) = 2 S x$ and $\nabla_x (x^T x) = 2 x$. Setting these equal and solving for $x$ gives us $S x = \lambda x$ for some scalar $\lambda$. This shows that $x$ must be an eigenvector of this symmetric matrix $S$. And the maximum value? That’s the largest eigenvalue of $S$, let’s call it $\sigma_1^2$.
So, the maximum stretch is $\sigma_1 = \sqrt{\lambda_{\text{max}}(A^T A)}$. This $\sigma_1$ is our first singular value. It tells us the length of the longest axis of the ellipsoid that our unit sphere turns into. We can then find the next direction, perpendicular to the first, that maximizes the stretch among what’s left. These lengths—$\sigma_1, \sigma_2, \dots$—are the singular values. They describe the geometry of the transformation completely, without worrying about whether the input and output spaces are the same.
The beautiful part is that we can keep going. What about the third longest axis? We can find it by maximising $x^T S x$ subject to $x^T x = 1$ and $x \perp {v_2, v_2}$. Becuase $A^{T}A$ is symmetric each eigenvector is orthogonal to all the others when their eigenvalues are distinct.
So we now know we can find these special input directions, $v_i$ and the scalar factor which gives us their length (the squared singular values, $\sigma_i^2$). But what happens to them after the transformation? Where do they end up? The output vector $Av_i$ points in the direction of the i-th axis of the output ellipsoid. Let’s call this $u_i = \frac{Av_i}{\sigma_i}$. So these vectors define the principal axes of the output ellipsoid and, furthermore, they satisfy their own eigenvalue equation:
\[AA^{T}(Av_{i}) = A(A^{T}Av_{i}) = A(\sigma^{2}_{i}v_{i}) = \sigma^{2}_{i}(Av_{i})\]So $AA^{T}u_{i} = \sigma^{2}_{i} u_{i}$. The matrix $AA^{T}$ defines our output geometry.
The transformation takes an orthogonal set of input directions and stretches each by a factor of $\sigma_i$ while rotating to point along output directions $u_i$. The singular values are the answer to the question of how much a transformation stretches space along its natural directions, those that remain perpendicular in the output space, and we can guarantee these directions exist for any matrix(!) because $A^{T}A$ is always symmetric.