Book Image

Clojure for Data Science

By : Henry Garner
Book Image

Clojure for Data Science

By: Henry Garner

Overview of this book

Table of Contents (18 chapters)
Clojure for Data Science
Credits
About the Author
Acknowledgments
About the Reviewer
www.PacktPub.com
Preface
Index

k-nearest neighbors


Our Mahout user-based recommender is making recommendations by looking at the neighborhood of the most similar users. This is commonly called k-nearest neighbors or k-NN.

It might appear that a user neighborhood is a lot like the k-means clusters we encountered in the previous chapter, but this is not quite the case. This is because each user sits at the center of their own neighborhood. With clustering, we aim to establish a smaller number of groupings, but with k-NN, there are as many neighborhoods as there are users; each user is their own neighborhood centroid.

Note

Mahout also defines ThresholdUserNeighbourhood that we could use to construct a neighborhood containing only the users that fall within a certain similarity from each other.

The k-NN algorithm means that we only generate recommendations based on the taste of the k most similar users. This makes intuitive sense; the users with taste most similar to your own are most likely to offer meaningful recommendations...