Book Image

Mastering Data analysis with R

By : Gergely Daróczi
Book Image

Mastering Data analysis with R

By: Gergely Daróczi

Overview of this book

Table of Contents (19 chapters)
Mastering Data Analysis with R
Credits
www.PacktPub.com
Preface

R package maintainers


Another similarly straightforward data source might be the list of R package maintainers. We can download the names and e-mail addresses of the package maintainers from a public page of CRAN, where this data is stored in a nicely structured HTML table that is extremely easy to parse:

> packages <- readHTMLTable(paste0('http://cran.r-project.org', 
+   '/web/checks/check_summary.html'), which = 2)

Extracting the names from the Maintainer column can be done via some quick data cleansing and transformations, mainly using regular expressions. Please note that the column name starts with a space—that's why we quoted the column name:

> maintainers <- sub('(.*) <(.*)>', '\\1', packages$' Maintainer')
> maintainers <- gsub(' ', ' ', maintainers)
> str(maintainers)
 chr [1:6994] "Scott Fortmann-Roe" "Gaurav Sood" "Blum Michael" ...

This list of almost 7,000 package maintainers includes some duplicated names (they maintain multiple packages). Let's...