# Gram matrix intuition

Recently I came across the Gram matrix and was trying to understand what it meant intuitively. Like most of my attempts to understand new mathematical concepts intuitively, the process quickly became recursive as the definitions relied on other terms I was unfamiliar with.

The Wikipedia  and Wolfram MathWorld  definitions were not super helpful, but an example on the Wikipedia page gave me a starting point:

Given a set of $n$ vectors $\bm{\phi}_{1}, \ldots, \bm{\phi}_{n}$ in $\mathbb{R}^d$ with the Euclidean dot product, the Gram matrix is $\bm{G} = \bm{\Phi}^\top \bm{\Phi}$ where $\bm{\Phi}$ is a matrix whose columns are the vectors $\bm{\phi}_k$

So each entry in this Gram matrix is the dot product of two vectors. But what is an intuitive interpretation of the dot product of two vectors? When will it be large? Small? Negative? The definition of the dot product gives some insight:

$\bm{a} \cdot \bm{b} = \left\Vert \bm{a} \right\Vert \left\Vert \bm{b} \right\Vert \text{cos} \, \theta$

In the case where $\bm{a}$ and $\bm{b}$ are unit vectors with magnitude 1, the dot product reduces to the cosine of the angle $\theta$ between $\bm{a}$ and $\bm{b}$. From the plot of the cosine function below, we can see the value of $\text{cos}\, \theta$ ranges from 1 when the angle between $\bm{a}$ and $\bm{b}$ is 0 (meaning they point in exactly the same direction) to -1 when the angle between $\bm{a}$ and $\bm{b}$ is $\pi$ radians (meaning they point in exactly opposite directions). So the dot product works as a similarity function for unit vectors. But what if $\bm{a}$ and $\bm{b}$ are not unit vectors? Then the dot product is scaled by the magnitude of each vector. This scaling allows the comparison of vectors of different magnitudes. If the vectors point in the same direction but are of different magnitudes, their dot product will be smaller than the dot product of the larger vector with itself. Likewise, their dot product will be larger than the dot product of the smaller vector with itself.

So one way to think of the gram matrix is as a matrix containing the dot product “similarity scores” for all pairs of vectors in the set. In the case of unit vectors, the entries are exactly the cosine similarity for each pair of vectors.