Book Image

Clojure for Data Science

By : Henry Garner
Book Image

Clojure for Data Science

By: Henry Garner

Overview of this book

Table of Contents (18 chapters)
Clojure for Data Science
Credits
About the Author
Acknowledgments
About the Reviewer
www.PacktPub.com
Preface
Index

Machine learning on Spark with MLlib


We've covered enough of the basics of Spark now to use our RDDs for machine learning. While Spark handles the infrastructure, the actual work of performing machine learning is handled by an apache Spark subproject called MLlib.

Note

An overview of all the capabilities of the MLlib library are at https://spark.apache.org/docs/latest/mllib-guide.html.

MLlib provides a wealth of machine learning algorithms for use on Spark, including those for regression, classification, and clustering covered elsewhere in this book. In this chapter, we'll be using the algorithm MLlib provides for performing collaborative filtering: alternating least squares.

Movie recommendations with alternating least squares

In Chapter 5, Big Data, we discovered how to use gradient descent to identify the parameters that minimize a cost function for a large quantity of data. In this chapter, we've seen how SVD can be used to calculate latent factors within a matrix of data through decomposition...