Book Image

Learning Apache Mahout

Book Image

Learning Apache Mahout

Overview of this book

Table of Contents (17 chapters)
Learning Apache Mahout
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Free Chapter
1
Introduction to Mahout
9
Case Study – Churn Analytics and Customer Segmentation
Index

Linear regression with Mahout Spark


We will discuss the linear regression example mentioned on the Mahout Wiki. Let's first create the training data in the form of a parallel DRM:

val drmData = drmParallelize(dense(
  (2, 2, 10.5, 10, 29.509541),  // Apple Cinnamon Cheerios
  (1, 2, 12,   12, 18.042851),  // Cap'n'Crunch
  (1, 1, 12,   13, 22.736446),  // Cocoa Puffs
  (2, 1, 11,   13, 32.207582),  // Froot Loops
  (1, 2, 12,   11, 21.871292),  // Honey Graham Ohs
  (2, 1, 16,   8,  36.187559),  // Wheaties Honey Gold
  (6, 2, 17,   1,  50.764999),  // Cheerios
  (3, 2, 13,   7,  40.400208),  // Clusters
  (3, 3, 13,   4,  45.811716)), // Great Grains Pecan
  numPartitions = 2);

The first four columns will be our feature vector and the last column will be our target variable. We will separate out the feature matrix and the target vector, drmX being the feature matrix and y being the target vector:

val drmX = drmData(::, 0 until 4)

The target variable is collected into the memory using the...