Book Image

Spark Cookbook

By : Rishi Yadav
Book Image

Spark Cookbook

By: Rishi Yadav

Overview of this book

Table of Contents (19 chapters)
Spark Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Doing classification using logistic regression


In classification, the response variable y has discreet values as opposed to continuous values. Some examples are e-mail (spam/non-spam), transactions (safe/fraudulent), and so on.

The y variable in the following equation can take on two values, 0 or 1:

Here, 0 is referred to as a negative class and 1 means a positive class. Though we are calling them a positive or negative class, it is only for convenience's sake. Algorithms are neutral about this assignment.

Linear regression, though it works well for regression tasks, hits a few limitations for classification tasks. These include:

  • The fitting process is very susceptible to outliers

  • There is no guarantee that the hypothesis function h(x) will fit in the range 0 (negative class) to 1 (positive class)

Logistic regression guarantees that h(x) will fit between 0 and 1. Though logistic regression has the word regression in it, it is more of a misnomer and it is very much a classification algorithm:

In...