-
Book Overview & Buying
-
Table Of Contents
Apache Spark 2.x Cookbook
By :
Classification is a technique to put data into different classes based on its utility. For example, an e-commerce company can apply two labels, namely will buy or will not buy, to the potential visitors.
This classification is done by providing some already labeled data to machine-learning algorithms called training data, as you know already. The challenge is how to mark the boundary between the two classes. Let's take a simple example, as shown in the following figure:

In the preceding case, we designated gray and black to the "will not buy" and "will buy" labels, respectively. Here, drawing a line between the two classes is easy, as follows:

Is this the best we can do? Not really. Let's try to do a better job. The black classifier is not really equidistant from the will buy and will not buy carts. Let's make a better attempt:

This looks good, doesn't it? This, in fact, is what the SVM algorithm does. You can see in the preceding diagram that there are...
Change the font size
Change margin width
Change background colour