The many uses of singular values
Important notes
- This is the text on the left side of the margin.
Linear transformations
The textbook definition of a linear transformation is as follows; if $T:V \mapsto W$ is a function from a vector space $V$ to a vector space $W$, then $T$ is called a linear transformation if satisfies two properties for all vectors in $V$ and all scalars.
- $T(ku) = kT(u)$
- $T(u + v) = T(u) + T(v)$
If $V=W$ then we call $T$ a linear operator on $V$.
The determinant
Now imagine we are looking at a unit cube, sitting at the origin of our coordinate system. As with most unit cubes, it has a volume of exactly one. Now, let’s apply a linear transformation, a matrix in this instance, to every point in space. What happens to our box? It might be stretched, squashed, or sheared into a parallelepiped. The volume it contains, the amount of ‘stuff’ that fits inside it, will change, but by how much?
The determinant is the answer to that question. It is the factor by which the volume of our little box is scaled. If the determinant is 2, the box has doubled in volume. If it is 0, the box has been squashed flat. All the volume is gone, as is a dimension.
Eigenvalues
This connects with eigenvalues. If we align ourselves with the “natural” axes of the transformation (the eigenvectors), the matrix acts purely by stretching or shrinking along those lines. The eigenvalues are just the stretching factors. So, if we stretch by $\lambda_1$ in one direction and $\lambda_2$ in another, the total volume change is just their product $\lambda_1 \lambda_2 \dots \lambda_n$. That is why the determinant is the product of the eigenvalues as it measures the total expansion of space.
Now, how do we know what the “natural” axes of the transformation are? We look for the directions that resist the urge to rotate. In the general shuffle of a linear transformation, almost every vector gets knocked off its original path and pointed somewhere new. But there are a few stubborn vectors that stay on their own line. They might get longer or shorter, but they don’t turn. These are our eigenvectors, they are the vectors whose direction remains unchanged. In other words, they are the vectors that are unchanged by the linear transformation up to a scalar multiple. We can write down an equation formalising this intuition; $Av = \lambda v$ for some scalar $\lambda$ which we will call the eigenvalue corresponding to the eigenvector $v$.
Eigenvectors, we have established, help us understand the geometry of a linear transformation by giving us an invariant direction corresponding to each dimension of the transformation. Eigenvalues tell us the amount by which the transformation stretches or shrinks along those directions. What if our transformation rotates everything? Then nothing stays on its line. Or worse, what if our matrix isn’t even square? What if it takes us from a 3D world to a 2D sheet? We can’t have an ‘eigenvector’ that goes in as a 3D arrow and comes out as a 2D arrow pointing in the ‘same’ direction as the dimensions do not match.
Singular values
Enter the singular values. If I have a vector $x$ of length , a unit vector pointing in some direction, how long is the new vector $Ax$? We want to find the direction $x$ that maximizes this length, the direction of maximum stretch.
Mathematically, we want to maximize the squared length $|Ax|^2$. Why squared? Because it’s easier to work with products than square roots. We can write this length as the dot product of the vector with itself: \(\|Ax\|^2 = (Ax) \cdot (Ax) = (Ax)^T (Ax)\) Using the rules of matrix transposes, this becomes: \(x^T A^T A x\) Lets define a shorthand for that big matrix in the middle, $S = A^T A$. This matrix $S$ is special. It’s symmetric and it’s square. And essentially, it measures how much $A$ stretches things, regardless of where $A$ actually sends them. We could think of it as the metric tensor of the transformation as it encodes all the geometry of the transformation. For any two vectors $u$ and $v$, the dot product after transformation is $(Au) \cdot (Av) = u^T A^T A v = u^T S v$. The matrix $S$ captures this.
The problem of maximizing $x^T S x$ for a unit vector $x$ is a famous one. We can solve it using Lagrange multipliers; maximise $x^T S x$ subject to $x^T x = 1$. Taking derivatives gives us $\nabla_x (x^T S x) = 2 S x$ and $\nabla_x (x^T x) = 2 x$. Setting these equal and solving for $x$ gives us $S x = \lambda x$ for some scalar $\lambda$. This shows that $x$ must be an eigenvector of this symmetric matrix $S$ and the maximum value is the largest eigenvalue of $S$, let’s call it $\sigma_1^2$.
So, the maximum stretch is $\sigma_1 = \sqrt{\lambda_{\text{max}}(A^T A)}$. This $\sigma_1$ is our first singular value. Importantly, as we have just shown, we do not need our matrix to be square in order to calculate it. It tells us the length of the longest axis of the ellipsoid that our unit sphere turns into. We can then find the next direction, perpendicular to the first, that maximizes the stretch among what’s left. These lengths—$\sigma_1, \sigma_2, \dots$—are the singular values. They describe the geometry of the transformation completely, without worrying about whether the input and output spaces are the same.
We can keep going. What about the third longest axis? We can find it by maximising $x^T S x$ subject to $x^T x = 1$ and $x \perp {v_2, v_2}$ (the little bookshelf symbol means perpendicular or orthogonal). Becuase $A^{T}A$ is symmetric each eigenvector is orthogonal to all the others when their eigenvalues are distinct.
So we now know we can find these special input directions, $v_i$ and the scalar factor which gives us their length (the squared singular values, $\sigma_i^2$). But what happens to them after the transformation? Where do they end up? The output vector $Av_i$ points in the direction of the i-th axis of the output ellipsoid. Let’s call this $u_i = \frac{Av_i}{\sigma_i}$. So these vectors define the principal axes of the output ellipsoid and, furthermore, they satisfy their own eigenvalue equation:
\[AA^{T}(Av_{i}) = A(A^{T}Av_{i}) = A(\sigma^{2}_{i}v_{i}) = \sigma^{2}_{i}(Av_{i})\]So $AA^{T}u_{i} = \sigma^{2}_{i} u_{i}$. The matrix $AA^{T}$ defines our output geometry.
The transformation takes an orthogonal set of input directions and stretches each by a factor of $\sigma_i$ while rotating to point along output directions $u_i$. The singular values are the answer to the question of how much a transformation stretches space along its natural directions, those that remain perpendicular in the output space, and we can guarantee these directions exist for any(!) matrix because $A^{T}A$ is always symmetric.
Uses of singular values
Condition number
Inverting probability distributions
###