-
Book Overview & Buying
-
Table Of Contents
Mastering Scala Machine Learning
By :
Spark, particularly with memory-based storage systems, claims to substantially improve the speed of data access within and between nodes. ML seems to be a natural fit, as many algorithms require multiple passes over the data, or repartitioning. MLlib is the open source library of choice, although private companies are catching, up with their own proprietary implementations.
As I will chow in Chapter 5, Regression and Classification, most of the standard machine learning algorithms can be expressed as an optimization problem. For example, classical linear regression minimizes the sum of squares of y distance between the regression line and the actual value of y:

Here,
are the predicted values according to the linear expression:

A is commonly called the slope, and B the intercept. In a more generalized formulation, a linear optimization problem is to minimize an additive function:

Here,
is a loss function and
is a regularization function. The regularization function is an increasing...