Book Image

Mastering Data analysis with R

By : Gergely Daróczi
Book Image

Mastering Data analysis with R

By: Gergely Daróczi

Overview of this book

Table of Contents (19 chapters)
Mastering Data Analysis with R
Credits
www.PacktPub.com
Preface

Scraping data from other online sources


Although the readHTMLTable function is very useful, sometimes the data is not structured in tables, but rather it's available only as HTML lists. Let's demonstrate such a data format by checking all the R packages listed in the relevant CRAN Task View at http://cran.r-project.org/web/views/WebTechnologies.html, as you can see in the following screenshot:

So we see a HTML list of the package names along with a URL pointing to the CRAN, or in some cases to the GitHub repositories. To proceed, first we have to get acquainted a bit with the HTML sources to see how we can parse them. You can do that easily either in Chrome or Firefox: just right-click on the CRAN packages heading at the top of the list, and choose Inspect Element, as you can see in the following screenshot:

So we have the list of related R packages in an ul (unordered list) HTML tag, just after the h3 (level 3 heading) tag holding the CRAN packages string.

In short:

  • We have to parse this HTML...