Euclidean distance assumes that the sample points are distributed about the sample mean in a spherical manner, which is not always true. Hence, the Pearson correlation score is used instead of the Euclidean distance score. The computation of the Pearson correlation score is explained next.
- We will create a new Python file and import the following packages:
import json import numpy as np
# Returns the Pearson correlation score between user1 and user2 def pearson _dist_score(dataset, FirstUser, SecondUser): if FirstUser not in dataset: raise TypeError('User ' + FirstUser + ' not present in the dataset') if SecondUser not in dataset: raise TypeError('User ' + SecondUser + ' not present in the dataset')
- We will now extract the movies that have been rated by both users:
# Movies rated...