Book Image

Mastering Data analysis with R

By : Gergely Daróczi
Book Image

Mastering Data analysis with R

By: Gergely Daróczi

Overview of this book

Table of Contents (19 chapters)
Mastering Data Analysis with R
Credits
www.PacktPub.com
Preface

The segmentation of documents


To identify the different groups of cleaned terms, based on the frequency and association of the terms in the documents of the corpus, one might directly use our tdm matrix to run, for example, the classic hierarchical cluster algorithm.

On the other hand, if you would rather like to cluster the R packages based on their description, we should compute a new matrix with DocumentTermMatrix, instead of the previously used TermDocumentMatrix. Then, calling the clustering algorithm on this matrix would result in the segmentation of the packages.

For more details on the available methods, algorithms, and guidance on choosing the appropriate functions for clustering, please see Chapter 10, Classification and Clustering. For now, we will fall back to the traditional hclust function, which provides a built-in way of running hierarchical clustering on distance matrices. For a quick demo, let's demonstrate this on the so-called Hadleyverse, which describes a useful collection...