Book Image

IPython Interactive Computing and Visualization Cookbook

By : Cyrille Rossant
Book Image

IPython Interactive Computing and Visualization Cookbook

By: Cyrille Rossant

Overview of this book

Table of Contents (22 chapters)
IPython Interactive Computing and Visualization Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Learning from text – Naive Bayes for Natural Language Processing


In this recipe, we show how to handle text data with scikit-learn. Working with text requires careful preprocessing and feature extraction. It is also quite common to deal with highly sparse matrices.

We will learn to recognize whether a comment posted during a public discussion is considered insulting to one of the participants. We will use a labeled dataset from Impermium, released during a Kaggle competition.

Getting ready

Download the Troll dataset from the book's GitHub repository at https://github.com/ipython-books/cookbook-data.

This dataset was obtained from Kaggle, at www.kaggle.com/c/detecting-insults-in-social-commentary.

How to do it...

  1. Let's import our libraries:

    In [1]: import numpy as np
            import pandas as pd
            import sklearn
            import sklearn.cross_validation as cv
            import sklearn.grid_search as gs
            import sklearn.feature_extraction.text as text
            import sklearn.naive_bayes as nb...