In this section, we will learn how to build a sentiment analysis model from scratch using Keras. To perform sentiment analysis, we will use sentiment analysis data from the University of Michigan that is available at https://www.kaggle.com/c/si650winter11/data. This dataset contains 7,086 movie reviews with labels. Label 1
denotes a positive sentiment, while 0
denotes a negative sentiment. In the repository, the dataset is stored in the file named sentiment.txt
.
Once you have installed the requisite packages (can be found in a requirements.txt
file with the code) to run this project and read the data, the next step is to preprocess the data:
- The first step is to get the tokens/word list from the reviews. Remove any punctuation and make sure that all of the tokens are in lowercase:
def get_processed_tokens(text): ''' Gets Token List from a Review ''' filtered_text = re.sub(r'[^a-zA-Z0-9\s]', '', text) #Removing Punctuations filtered_text...