Book Image

Learning Data Mining with Python

Book Image

Learning Data Mining with Python

Overview of this book

Table of Contents (20 chapters)
Learning Data Mining with Python
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Chapter 6 – Social Media Insight Using Naive Bayes


Spam detection

http://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter

Using the concepts in this chapter, you can create a spam detection method that is able to view a social media post and determine whether it is spam or not. Try this out by first creating a dataset of spam/not-spam posts, implementing the text mining algorithms, and then evaluating them.

One important consideration with spam detection is the false-positive/false-negative ratio. Many people would prefer to have a couple of spam messages slip through, rather than miss out on a legitimate message because the filter was too aggressive in stopping the spam. In order to turn your method for this, you can use a Grid Search with the f1-score as the evaluation criteria. See the above link for information on how to do this.

Natural language processing and part-of-speech tagging

http://www.nltk.org/book/ch05.html

The techniques we used in this chapter were quite lightweight compared to some of the linguistic models employed in other areas. For example, part-of-speech tagging can help disambiguate word forms, allowing for higher accuracy. The book that comes with NLTK has a chapter on this, linked above. The whole book is well worth reading too.