Here are 10 questions that will help to reinforce all of the learning material presented in this chapter:
- Is the spam classification task a binary classification task?
- What was the significance of the hashing trick in the spam classification task?
- What is hashing collision and how is it minimized?
- What do we mean by inverse document frequency?
- What are stop words and why do they matter?
- What is the role played by the Naive Bayes algorithm in spam classification?
- How do you use the
HashingTF
class in Spark to implement the hashing trick in your spam classification process? - What is meant by the vectorization of features?
- Is there a better algorithm that you can think of to implement the spam classification process?
- What are the benefits of spam filtering, and why do they matter in business terms?