Immanuel Bayer and Steffen Rendle
This paper provides an interesting example of using factorisation machines for implicit feedback via the Bayesian personalised ranking (BPR) optimisation criterion.
The challenge was to recommend first names (e.g. to soon-to-be parents). Participants were provided with the browsing history of users on the name selection website Nameling. So the items to be recommended are names.
Users are considered together with their browsing history, consisting of the name they looked at last and the list of all the names they looked at up to the time in question. The combination of a user (in this complete sense) at time and a candidate name is vectorised as follows:
The (order 2) factorisation machine assigns a score to the combination as follows:
where is the rank of this vectorisation, , and are model parameters to be learned. These parameters are learned via stochastic gradient ascent of the following pair-wise learning objective:
where is the set of (user u, time t, name n) tuples where u has browsed n before time t, is the set of all possible names and is the sigmoid function. Only a single name is chosen from for each update. These negative samples are drawn according to their estimated rank (this part is quite difficult to do efficiently).
In the above, we have purposefully omitted the “prefix smoothing” step, since we are mainly interested in the a simple factorisation machine example. The details are in the paper.
This recommender did very well in the challenge. However, I don’t find the examples given in the paper very impressive (though I have not seen the examples given by others). My suspicion is that the FM approach is very strong, but that there is no good way to make first name recommendations using the provided dataset. A more effective dataset would be e.g. first name x product interaction on a large e-commerce site. This would do a much better job of capturing the social meaning of names, but could go out of date very quickly.