2.7. Training the Model
The data is now ready for training a machine learning model. But first, we need to divide our data into the training and test sets. Using the training data, the naive Bayes algorithm will learn the relationship between the email text and the email label (spam or not) since both email text and corresponding labels are given in the training dataset.
Once the naive Bayes model is trained on the training set, the test set containing only email texts is passed as inputs to the model. The model then predicts which of the emails in the test set are spam. Predicted outputs for the test set are then compared with the actual label in the test data in order to determine the performance of the spam email detector naive Bayes model.
The following script divides the data into training and test sets.
Script 12:
1. from sklearn.model_selection import train_test_split
2. X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)
To train the machine...