Book Image

Learning R Programming

By : Kun Ren
Book Image

Learning R Programming

By: Kun Ren

Overview of this book

R is a high-level functional language and one of the must-know tools for data science and statistics. Powerful but complex, R can be challenging for beginners and those unfamiliar with its unique behaviors. Learning R Programming is the solution - an easy and practical way to learn R and develop a broad and consistent understanding of the language. Through hands-on examples you'll discover powerful R tools, and R best practices that will give you a deeper understanding of working with data. You'll get to grips with R's data structures and data processing techniques, as well as the most popular R packages to boost your productivity from the offset. Start with the basics of R, then dive deep into the programming techniques and paradigms to make your R code excel. Advance quickly to a deeper understanding of R's behavior as you learn common tasks including data analysis, databases, web scraping, high performance computing, and writing documents. By the end of the book, you'll be a confident R programmer adept at solving problems with the right techniques.
Table of Contents (21 chapters)
Learning R Programming
Credits
About the Author
About the Reviewer
www.PacktPub.com
Preface

Extracting data from web pages using CSS selectors


In R, the easiest-to-use package for web scraping is rvest. Run the following code to install the package from CRAN:

install.packages("rvest") 

First, we load the package and use read_html() to read data/single-table.html and try to extract the table from the web page:

library(rvest) 
## Loading required package: xml2 
single_table_page <- read_html("data/single-table.html") 
single_table_page 
## {xml_document} 
## <html> 
## [1] <head>\n  <title>Single table</title>\n</head> 
## [2] <body>\n  <p>The following is a table</p>\n  <table i ... 

Note that single_table_page is a parsed HTML document, which is a nested data structure of HTML nodes.

A typical process for scraping information from such a web page using rvest functions is: First, locate the HTML nodes from which we need to extract data. Then, use either the CSS selector or XPath expression...