Naïve Bayes
In Chapter 5, Sentiment Lexicons and Vector-Space Models, we investigated the use of simple lexicon-based classifiers, using both a hand-coded sentiment lexicon and extracting a lexicon from a corpus of marked-up texts. The results from this investigation were that such models can produce reasonable scores, with a variety of tweaks (using a stemmer or changing the way that weights are calculated, such as by using TF-IDF scores) that produce improvements in some cases but not in others. We will now turn to a range of machine learning algorithms to see whether they will lead to better results.
For most of the algorithms that we will be looking at, we will use the Python scikit-learn (sklearn
) implementations. A wide range of implementations for all these algorithms are available. The sklearn
versions have two substantial advantages: they are freely available with a fairly consistent interface to the training and testing data and they can be easily installed and run...