The implementation in Spark MLlib supports model-based collaborative filtering. In the model-based collaborative filtering technique, users and products are described by a small set of factors, also called LFs. In this section, we will see two complete examples of how it works toward recommending movies for new users.
Firstly, we read the ratings from a file. For this project, we can use the MovieLens 100k rating dataset from http://www.grouplens.org/node/73. The training set ratings are in a file called ua.base
, while the movie item data is in u.item
. On the other hand, ua.test
contains the test set to evaluate our model. Since we will be using this dataset, we should acknowledge the GroupLens Research Project team at the University of Minnesota who wrote the following text:
F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive...