The first kind of analysis is called sentiment analysis. It basically involves trying to understand the mood expressed in a piece of text. We are therefore going to look for the overall sentiment of each of the comments to see whether the general sentiment is mainly good or bad for those companies.
A common technique employed to perform this analysis is based on the use of a lexicon, which is a dataset that stores a wide list of words, with each word paired with an attribute that expresses the sentiment of the given word. The tidytext
package provides three different lexicons to choose from:
afinn
: Assigning the sentiment as a score from -5 (negative) to 5 (positive)bing
: Denoting the sentiment as either positive or negativenrc
: Assigning various levels of sentiment, such as joy and fear
We can easily explore them by calling the get_sentiments()
function . Let's inspect bing
:
get_sentiments("bing")
What do we do now to understand the sentiment of our documents?
The most straightforward...