A linear model classifies samples using separating hyperplanes; hence, a problem is linearly separable if it's possible to find a linear model whose accuracy overcomes a predetermined threshold. Logistic regression is one of most famous linear classifiers, based on the principle of maximizing the probability of a sample belonging to the right class. Stochastic gradient descent classifiers are a more generic family of algorithms, determined by the different loss function that is adopted. SGD allows partial fitting, particularly when the amount of data is too huge to be loaded in memory. A perceptron is a particular instance of SGD, representing a linear neural network that cannot solve the XOR
problem (for this reason, multi-layer perceptrons became the first choice for non-linear classification). However, in general, its performance is comparable to a logistic regression model.
All classifier performances must be measured using different approaches, in order to be able to optimize...