Modern R Programming Cookbook

By: Jaynal Abedin

Overview of this book

R is a powerful tool for statistics, graphics, and statistical programming. It is used by tens of thousands of people daily to perform serious statistical analyses. It is a free, open source system whose implementation is the collective accomplishment of many intelligent, hard-working people. There are more than 2,000 available add-ons, and R is a serious rival to all commercial statistical packages. The objective of this book is to show how to work with different programming aspects of R. The emerging R developers and data science could have very good programming knowledge but might have limited understanding about R syntax and semantics. Our book will be a platform develop practical solution out of real world problem in scalable fashion and with very good understanding. You will work with various versions of R libraries that are essential for scalable data science solutions. You will learn to work with Input / Output issues when working with relatively larger dataset. At the end of this book readers will also learn how to work with databases from within R and also what and how meta programming helps in developing applications.
Extracting text data from an HTML page

You have seen an example of reading the HTML source code as a text vector in the Extracting unstructured text data from a plain web page recipe in this chapter. In this recipe, further processing is not straightforward because the output object contains plain text as well as HTML code tags. It is a time-consuming task to clean up the HTML tags from plain text.

In this recipe, you will read the same web page from the following link:

However, this time, you will use a different strategy so that you can play with HTML tags.

Getting ready

To implement this recipe, you need to use a customized library, particularly, the rvest library...