Book Image

Spark Cookbook

By : Rishi Yadav
Book Image

Spark Cookbook

By: Rishi Yadav

Overview of this book

Table of Contents (19 chapters)
Spark Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Doing binary classification using SVM


Classification is a technique to put data into different classes based on its utility. For example, an e-commerce company can apply two labels "will buy" or "will not buy" to potential visitors.

This classification is done by providing some already labeled data to machine learning algorithms called training data. The challenge is how to mark the boundary between two classes. Let's take a simple example as shown in the following figure:

In the preceding case, we designated gray and black to the "will not buy" and "will buy" labels. Here, drawing a line between the two classes is as easy as follows:

Is this the best we can do? Not really, let's try to do a better job. The black classifier is not really equidistant from the "will buy" and "will not buy" carts. Let's make a better attempt like the following:

Now this is looking good. This in fact is what the SVM algorithm does. You can see in the preceding diagram that in fact there are only three carts that...