In this chapter, we will look at matrix calculations and machine learning. The main differences between data processing applications, is that this chapter focuses on matrix and set algebra.
Machine learning requires understanding of the basic vector and matrix representations and operations. A vector is a list (or a tuple) of elements, and a matrix is a rectangular array of elements. The transpose of matrix A is a matrix that is formed by turning all the rows of a given matrix into columns.
We will use the above principles and present how Scalding can be utilized to implement concrete examples, including the following:
Text similarity using term frequency/inverse document frequency
Set-based similarity using the Jaccard coefficient
Clustering algorithm using K-Means