Looking at combinations of words in, say, bigrams or trigrams can help you understand relationships between words. Using tidy methods again, we'll create bigrams and learn about those relationships to extract insights from the text. I will continue with the subject of President Lincoln as that will allow you to compare what you gain with n-grams versus just words. Getting started is easy, as you just specify the number of words to join. Notice in the following code that I maintain word capitalization:
> sotu_bigrams <- sotu_meta %>% dplyr::filter(year > 1860 & year < 1865) %>% tidytext::unnest_tokens(bigram, text, token = "ngrams", n = 2, to_lower = FALSE)
Let's take a look at this:
> sotu_bigrams %>% dplyr::count(bigram, sort = TRUE) # A tibble: 17,687 x 2 bigram n <chr> <int> 1 of the 509 2 to the 180 3 in the 146 4 by the 97 5 for the 94 6 have been 82 7 United...