Book Image

Mastering Python for Data Science

By : Samir Madhavan
Book Image

Mastering Python for Data Science

By: Samir Madhavan

Overview of this book

Table of Contents (19 chapters)
Mastering Python for Data Science
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
7
Estimating the Likelihood of Events
Index

User-based collaborative filtering


Let's start to build a user-based collaborative filter by finding users who are similar to each other.

Finding similar users

When you have data about what people like, you need a way to determine the similarity between different users. The similarity between different users is determined by comparing each user with every other user and computing a similarity score. This similarity score can be computed using the Pearson correlation, the Euclidean distance, the Manhattan distance, and so on.

The Euclidean distance score

The Euclidean distance is the minimum distance between two points in space. Let's try to understand this by plotting the users who have watched Django Unchained and Avengers.

We'll create a DataFrame that contains the user, django, and avenger columns, where django and avenger contain the ratings given by the user:

>>> data = []
>>> for i in movie_user_preferences.keys():
      try:
          data.append( (i
          ,movie_user_preferences...