We can make our first attempt to look at these words using the wordcloud
package, which basically lets you obtain what you are thinking of: wordclouds.
To create a wordcloud, we just have to call the wordcloud()
function, which requires two arguments:
words
: The words to be plottedfrequency
: The number of occurrences of each word
Let's do it:
comments_tidy %>% count(word) %>% with(wordcloud(word, n))
Reproduced in the plot are all the words stored within the comments_tidy
object, with a size proportionate to their frequency. You should also be aware that the position of each word has no particular meaning hear.
What do you think about it? Not too bad, isn't it? Nevertheless, I can see too many irrelevant words, such as we and with. These words do not actually convey any useful information about the content of the comments, and because they are quite frequent, they are obscuring the relevance of other, more meaningful, words.
We should therefore remove them...