Linear regression with Mahout Spark
We will discuss the linear regression example mentioned on the Mahout Wiki. Let's first create the training data in the form of a parallel DRM:
val drmData = drmParallelize(dense( (2, 2, 10.5, 10, 29.509541), // Apple Cinnamon Cheerios (1, 2, 12, 12, 18.042851), // Cap'n'Crunch (1, 1, 12, 13, 22.736446), // Cocoa Puffs (2, 1, 11, 13, 32.207582), // Froot Loops (1, 2, 12, 11, 21.871292), // Honey Graham Ohs (2, 1, 16, 8, 36.187559), // Wheaties Honey Gold (6, 2, 17, 1, 50.764999), // Cheerios (3, 2, 13, 7, 40.400208), // Clusters (3, 3, 13, 4, 45.811716)), // Great Grains Pecan numPartitions = 2);
The first four columns will be our feature vector and the last column will be our target variable. We will separate out the feature matrix and the target vector, drmX
being the feature matrix and y
being the target vector:
val drmX = drmData(::, 0 until 4)
The target variable is collected into the memory using the...