Content-based filtering is out of the scope of the Mahout framework, mainly because it is up to you to decide how to define similar items. If we want to do a content-based item similarity, we need to implement our own ItemSimilarity. For instance, in our book's dataset, we might want to make up the following rule for book similarity:
- If the genres are the same, add 0.15 to similarity
- If the author is the same, add 0.50 to similarity
We can now implement our own similarity measure, as follows:
class MyItemSimilarity implements ItemSimilarity { ... public double itemSimilarity(long itemID1, long itemID2) { MyBook book1 = lookupMyBook (itemID1); MyBook book2 = lookupMyBook (itemID2); double similarity = 0.0; if (book1.getGenre().equals(book2.getGenre()) similarity += 0.15; } if (book1.getAuthor().equals(book2. getAuthor (...