Computer programs face limitations in interpreting the meaning of given sentences, and therefore do not know how to group documents based on their similarities. However, if we can convert sentences into a mathematical matrix (document term matrix), a program can compute the distance between each document and group similar ones together.
In this recipe, we demonstrate how to compute the distance between text documents and how we can cluster similar text documents with the k-means method.
In this recipe, we use news titles as clustering input. You can find the data on the author's GitHub page at https://github.com/ywchiu/rcookbook/raw/master/chapter12/news.RData.