Book Image

Spark Cookbook

By : Rishi Yadav
Book Image

Spark Cookbook

By: Rishi Yadav

Overview of this book

Table of Contents (19 chapters)
Spark Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Introduction


The following is Wikipedia's definition of supervised learning:

"Supervised learning is the machine learning task of inferring a function from labeled training data."

Supervised learning has two steps:

  • Train the algorithm with training dataset; it is like giving questions and their answers first

  • Use test dataset to ask another set of questions to the trained algorithm

There are two types of supervised learning algorithms:

  • Regression: This predicts continuous value output, such as house price.

  • Classification: This predicts discreet valued output (0 or 1) called label, such as whether an e-mail is a spam or not. Classification is not limited to two values; it can have multiple values such as marking an e-mail important, not important, urgent, and so on (0, 1, 2…).

Note

We are going to cover regression in this chapter and classification in the next.

As an example dataset for regression, we will use the recently sold house data of the City of Saratoga, CA, as a training set to train the algorithm...