This is an interesting application of natural language processing. Sentence auto-completion is an interesting feature that is shockingly absent in our modern-day browsers and mobile interfaces. Getting grammatically and contextually relevant suggestions as to what to type next, while we are typing a few words, would be such a great feature to have.
Coursera, in one of the data science courses by Johns Hopkins, provided four compressed datasets that contain terms and frequencies of unigram, bigram, trigram, and 4-gram in four datasets. The problem at hand was to come up with a model that can learn to predict relevant words to type next.
The following code uses the Katz-Backoff algorithm, leveraging the four n-gram term frequency datasets to predict the next word in a sentence:
library(tm) library(stringr) # Load the n-gram data load("/data_frame1.RData"); load("/data_frame2.RData"); load("/data_frame3.RData"); load("/data_frame4.RData"); CleanInputString<- function(input_string...