Hu, Koren and Volinsky (AT&T, Yahoo!), 2008.
A well-written paper.
The authors give a good description of the distinctions between explicit and implicit feedback datasets, pointing out in particular that:
- implicit feedback data is inherently noisy, since a user might decide that they do not like an item after viewing it — interaction does not necessarily indicate interest.
- the numerical value in explicit feedback indicates preference whereas in the implicit case indicates confidence.
The authors describe their model as being based on SVD, but this is not accurate, since they weight squared difference summands in the cost function according to a confidence value (which is proportional to the number of interactions for that user-item pair).
The input matrix is the user-item matrix.
Optimisation is via alternating least squares.
Their evaluation metric is percentile rank based.
Their model, which we’ll call “weighted SVD” (they speak of “confidence intervals”) compares favourably with the baseline popularity method and also with an old-school item-based neighbourhood method, in terms of the expected percentile rank (Figure 1). Interestingly, the differences are less marked when the probability that a desired item is in the top (say) 1% is considered (Figure 2).
The unweighted SVD on the user-item matrix is shown to perform terribly, with a significant but insufficient improvement obtained with regularisation.