In classification, the response variable y has discreet values as opposed to continuous values. Some examples are e-mail (spam/non-spam), transactions (safe/fraudulent), and so on.
The y variable in the following equation can take on two values, 0 or 1:
Here, 0 is referred to as a negative class and 1 means a positive class. Though we are calling them a positive or negative class, it is only for convenience's sake. Algorithms are neutral about this assignment.
Linear regression, though it works well for regression tasks, hits a few limitations for classification tasks. These include:
The fitting process is very susceptible to outliers
There is no guarantee that the hypothesis function h(x) will fit in the range 0 (negative class) to 1 (positive class)
Logistic regression guarantees that h(x) will fit between 0 and 1. Though logistic regression has the word regression in it, it is more of a misnomer and it is very much a classification algorithm:
In...