Book Image

Haskell Data Analysis Cookbook

By : Nishant Shukla
Book Image

Haskell Data Analysis Cookbook

By: Nishant Shukla

Overview of this book

Table of Contents (19 chapters)
Haskell Data Analysis Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Comparing sparse data using cosine similarity


When a data set has multiple empty fields, comparing the distance using the Manhattan or Euclidean metrics might result in skewed results. Cosine similarity measures how closely two vectors are oriented with each other. For example, the vectors (82, 86) and (86, 82) essentially point in the same direction. In fact, their cosine similarity is equivalent to the cosine similarity between (41, 43) and (43, 41). A cosine similarity of 1 corresponds to vectors that point in the exact same direction, and 0 corresponds to vectors that are completely orthogonal to each other.

As long as the angles between the two vectors are equal, their cosine similarity is equivalent. Applying a distance metric such as the Manhattan distance or Euclidean distance in this case produces a significant difference between the two sets of data.

The cosine similarity between the two vectors is the dot product of the two vectors divided by the product of their magnitudes.

How...