Let's start to build a user-based collaborative filter by finding users who are similar to each other.
When you have data about what people like, you need a way to determine the similarity between different users. The similarity between different users is determined by comparing each user with every other user and computing a similarity score. This similarity score can be computed using the Pearson correlation, the Euclidean distance, the Manhattan distance, and so on.
The Euclidean distance is the minimum distance between two points in space. Let's try to understand this by plotting the users who have watched Django Unchained and Avengers.
We'll create a DataFrame that contains the user
, django
, and avenger
columns, where django
and avenger
contain the ratings given by the user:
>>> data = [] >>> for i in movie_user_preferences.keys(): try: data.append( (i ,movie_user_preferences...