In this chapter, we are going to learn about natural language processing. We will discuss various concepts such as tokenization, stemming, and lemmatization to process text. We will then discuss how to build a Bag of Words model and use it to classify text. We will see how to use machine learning to analyze the sentiment of a given sentence. We will then discuss topic modeling and implement a system to identify topics in a given document.
By the end of this chapter, you will know:
How to install relevant packages
Tokenizing text data
Converting words to their base forms using stemming
Converting words to their base forms using lemmatization
Dividing text data into chunks
Extracting document term matrix using the Bag of Words model
Building a category predictor
Constructing a gender identifier
Building a sentiment analyzer
Topic modeling using Latent Dirichlet Allocation