Book Image

Programming MapReduce with Scalding

By : Antonios Chalkiopoulos
Book Image

Programming MapReduce with Scalding

By: Antonios Chalkiopoulos

Overview of this book

Table of Contents (16 chapters)
Programming MapReduce with Scalding
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Chapter 9. Matrix Calculations and Machine Learning

In this chapter, we will look at matrix calculations and machine learning. The main differences between data processing applications, is that this chapter focuses on matrix and set algebra.

Machine learning requires understanding of the basic vector and matrix representations and operations. A vector is a list (or a tuple) of elements, and a matrix is a rectangular array of elements. The transpose of matrix A is a matrix that is formed by turning all the rows of a given matrix into columns.

We will use the above principles and present how Scalding can be utilized to implement concrete examples, including the following:

  • Text similarity using term frequency/inverse document frequency

  • Set-based similarity using the Jaccard coefficient

  • Clustering algorithm using K-Means