Introducing content-based filtering
Systems based on content-based filtering exploit the properties of items to recommend new products with similar features. The statement that drives the central paradigm behind these recommenders is show me items similar to the ones I liked in the past. What can be considered properties of an item is an open issue and it is up to the system developer to define a proper set. Sometimes, it is evident from the samples; otherwise, we have to improvise and experiment to elicit the proper features. A poorly chosen set can negatively impact the outcome; this is where an experienced data scientist can make a difference.
This book’s focus on text data drives our decision on the properties to implement in the recommender system. Thus, we create a bag of words for each music item containing its review text and genres. We call it metadata
:
# Group all tags per product id.
product_tags = pd.DataFrame(reviews.groupby('productId')[&apos...