In the previous chapter, we discussed unsupervised learning where we have input data only. In terms of the function y=f(x), for unsupervised learning we have only inputs x. Unlike unsupervised learning, we have both inputs x and the corresponding output y for supervised learning. Our task is to find the best function, linking x with y, based on our training dataset. In supervised learning, our training dataset consists of an input object, typically a vector, and a desired output value, where it could be either binary, categorical, discrete, or continuous. A supervised learning algorithm examines a given training dataset and produces an inferred best-fit function. To verify the accuracy of this inferred function, we use the second dataset, the test set.
In an ideal world, we would want to have a large sample size. However, for many occasions, this...