## Parameterising the Mahalanobis distances for metric learning

Below are the notes I made to prepare for a short talk given at our seminar on learning distance metrics, and the Mahalanobis distances in particular. We show that the Mahalanobis distances can be parameterised by the positive semidefinite (PSD) matrices or alternatively (in a highly redundant way) by all matrices. The set of PSD matrices is convex, but in order to perform gradient descent to optimise the objective function, we need to perform a costly projection after each update involving the singular value decomposition.

We note along the way that a Mahalanobis distance is nothing more than the Euclidean distance after applying a linear transform to the data.

The example of the 2×2 PSD matrices is worked out in detail here.

What’s here documents my first steps. What I really discovered is that metric learning is a research domain in its own right, and that a great deal of work has been done. There is an excellent survey by Bellet et al. (2013) that covers everything I have said in the first two of its sixty pages.

## Visualising the set of 2×2 positive semidefinite matrices

Recall that a symmetric matrix is called positive semidefinite (“PSD”) if, for any , we have . Positive semidefinite matrices occur, for instance, in the study of bilinear forms and as the Gram (or covariance) matrices in probability theory. In the case where , the space of symmetric matrices is 3-dimensional, and we can actually draw the subset of all positive semidefinite matrices – it looks like the bow of a ship.

It is clear that in the case illustrated below, the PSD matrices form a convex subset. It is easy to show this in general, by observing that the set of all PSD matrices is closed under addition and multiplication by non-negative scalars. The convexity of this set is crucial for the fitting of Mahalanobis distances in metric learning, which is how I got interested PSD matrices in the first place.