There are a lot of algorithms at our disposal for supervised learning. We choose the algorithm based on the task and the data we have at our disposal. If we don't have much data and there is already some knowledge around our problem, deep learning is probably not the best approach to start with. We should rather try simpler algorithms and come up with relevant features based on the knowledge we have.
Starting simple is always a good practice; for example, for categorization, a good starting point can be a decision tree. A simple decision tree algorithm that is difficult to overfit is random forest. It also gives good results out of the box. For regression problems, linear regression is still very popular, especially in domains, where it's necessary to justify the decision taken. For other problems, such as recommender systems, a good starting point can be Matrix Factorization. Each domain has a standard algorithm that is better to start with.
A simple example of a task could be to predict the price of a house for sale, given the location and some information about the house. This is a regression problem, and there are a set of algorithms in scikit-learn that can perform the task. If we want to use a liner regression, we can do the following:
from sklearn.datasets.california_housing import fetch_california_housing
from sklearn.linear_model import LinearRegression
# Using a standard dataset that we can find in scikit-learn
cal_house = fetch_california_housing()
cal_house_X_train = cal_house.data[:-20] cal_house_X_test = cal_house.data[-20:] # Split the targets into training/testing sets cal_house_y_train = cal_house.target[:-20] cal_house_y_test = cal_house.target[-20:] # Create linear regression object regr = linear_model.LinearRegression() # Train the model using the training sets regr.fit(cal_house_X_train, cal_house_y_train)
# Calculating the predictions
predictions = regr.predict(cal_house_X_test)
# Calculating the loss
print('MSE: {:.2f}'.format(mean_squared_error(cal_house_y_test, predictions)))
It's possible to run the file after activated our virtual environment (or conda environment) and saved the code in a file named house_LR.py. Then from where you placed your file run the following command line:
python house_LR.py
The interesting part about NNs is that they can be used instead of any of the tasks mentioned previously, provided that enough data is available. Moreover, when a neural network is trained it means that we have a way to do feature engineering, and part of the network itself can be used to do the feature engineering for similar tasks. This method is called transfer learning (TL), and we will dedicate a chapter to it later on.