We show below that vectors drawn uniformly at random from the unit sphere are more likely to be orthogonal in higher dimensions.
In information retrieval and other areas besides, it is common to use the dot product of normalised vectors as a measure of their similarity. It can be problematic that the similarity measure depends upon the rank of the representation, as it does here — it means, for example, that similarity thresholds (for relevance in a particular situation) need to be re-calibrated if the underlying vectorisation model is retrained e.g. in a higher dimension.
Does anyone have a solution to this? Is there a (of course, dimension dependent) transformation that can be applied to the dot product values to arrive at some dimension independent measure of similarity?
I asked this question on “Cross Validated”. Unfortunately, it has attracted so little interest that it has earned me the “tumbleweed” badge!
For those interested in the distribution of the dot products, and not just the expectation and the variance derived below, see this answer by whuber on Cross Validated.