Sometimes the feedback available is not in the form of ratings but in the form of audio tracks played, movies watched, and so on. This data, at first glance, may not look as good as explicit ratings by users, but this is much more exhaustive.
We are going to use million song data from http://www.kaggle.com/c/msdchallenge/data. You need to download three files:
kaggle_visible_evaluation_triplets
kaggle_users.txt
kaggle_songs.txt
Now perform the following steps:
We still need to do some more preprocessing. ALS in MLlib takes both user and product IDs as integer. The Kaggle_songs.txt
file has song IDs and sequence number next to it, The Kaggle_users.txt
file does not...