Chapter 8. Applying Machine Learning to Sentiment Analysis
In this internet and social media age, people's opinions, reviews, and recommendations have become a valuable resource for political science and businesses. Thanks to modern technologies, we are now able to collect and analyze such data most efficiently. In this chapter, we will delve into a subfield of Natural Language Processing (NLP) called sentiment analysis and learn how to use machine learning algorithms to classify documents based on their polarity: the attitude of the writer. In particular, we are going to work with a dataset of 50,000 movie reviews from the Internet Movie Database (IMDb) and build a predictor that can distinguish between positive and negative reviews.
The topics that we will cover in the following sections include the following:
Cleaning and preparing text data
Building feature vectors from text documents
Training a machine learning model to classify positive and negative movie reviews
Working with large text...