In this section, we are going to look at how we can solve the spam message detection problem using all the concepts we have gone through in this chapter.
We are going to take a bunch of SMS messages and attempt to classify them as spam or non-spam. This is a classification problem and we will use the linear SVM algorithm to perform this, considering the advantages of using this algorithm for text classification.
We are going to use NLP techniques to convert the data-SMS messages into a feature vector to feed into the linear SVM algorithm. We are going to use the scikit-learn vectorizer methods to transform the SMS messages into the TF-IDF vector, which could be fed into the linear SVM model to perform SMS spam detection (classification into spam and non-spam).
The data that we are using to create the model that detects the spam messages is taken from http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/, which contains 747 spam...