-
Book Overview & Buying
-
Table Of Contents
Apache Spark 2.x Cookbook
By :
Sometimes, the feedback available is not in the form of ratings but in the form of audio tracks played, movies watched, and so on. This data, at first glance, may not look as good as explicit ratings by users, but this is much more exhaustive.
We are going to use the million song data from http://www.kaggle.com/c/msdchallenge/data. You need to download three files:
kaggle_visible_evaluation_tripletskaggle_users.txtkaggle_songs.txtWe still need to do some more preprocessing. ALS in MLlib takes both user and product IDs as integers. The Kaggle_songs.txt file has song IDs and a sequence number next to it. The Kaggle_users.txt file does not have a sequence number. Our goal is to replace the userid and songid in the triplets data with the corresponding integer sequence numbers. To do this, follow these steps:
$ spark-shell import org.apache.spark...
Change the font size
Change margin width
Change background colour