The Euclidean distance score is a good metric, but it has some shortcomings. Hence, Pearson correlation score is frequently used in recommendation engines. Let's see how to compute it.
Create a new Python file, and import the following packages:
import json import numpy as np
We will define a function to compute the Pearson correlation score between two users in the database. Our first step is to confirm that these users exist in the database:
# Returns the Pearson correlation score between user1 and user2 def pearson_score(dataset, user1, user2): if user1 not in dataset: raise TypeError('User ' + user1 + ' not present in the dataset') if user2 not in dataset: raise TypeError('User ' + user2 + ' not present in the dataset')
The next step is to get the movies that both these users rated:
# Movies rated by both user1 and user2 rated_by_both = {} for item in dataset[user1]: if item in dataset[user2...