Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying R Data Analysis Cookbook, Second Edition
  • Table Of Contents Toc
R Data Analysis Cookbook, Second Edition

R Data Analysis Cookbook, Second Edition - Second Edition

By : Kuntal Ganguly, Viswanathan, Viswa Viswanathan
3.3 (4)
close
close
R Data Analysis Cookbook, Second Edition

R Data Analysis Cookbook, Second Edition

3.3 (4)
By: Kuntal Ganguly, Viswanathan, Viswa Viswanathan

Overview of this book

Data analytics with R has emerged as a very important focus for organizations of all kinds. R enables even those with only an intuitive grasp of the underlying concepts, without a deep mathematical background, to unleash powerful and detailed examinations of their data. This book will show you how you can put your data analysis skills in R to practical use, with recipes catering to the basic as well as advanced data analysis tasks. Right from acquiring your data and preparing it for analysis to the more complex data analysis techniques, the book will show you how you can implement each technique in the best possible manner. You will also visualize your data using the popular R packages like ggplot2 and gain hidden insights from it. Starting with implementing the basic data analysis concepts like handling your data to creating basic plots, you will master the more advanced data analysis techniques like performing cluster analysis, and generating effective analysis reports and visualizations. Throughout the book, you will get to know the common problems and obstacles you might encounter while implementing each of the data analysis techniques in R, with ways to overcoming them in the easiest possible way. By the end of this book, you will have all the knowledge you need to become an expert in data analysis with R, and put your skills to test in real-world scenarios.
Table of Contents (14 chapters)
close
close

Reading XML data

You may sometimes need to extract data from websites. Many providers also supply data in XML and JSON formats. In this recipe, we learn about reading XML data.

Getting ready

Make sure you have downloaded the files for this chapters and the files cd_catalog.xml and WorldPopulation-wiki.htm are in working directory of R. If the XML package is not already installed in your R environment, install the package now, as follows:

> install.packages("XML") 

How to do it...

XML data can be read by following these steps:

  1. Load the library and initialize:
> library(XML) 
> url <- "cd_catalog.xml"
  1. Parse the XML file and get the root node:
> xmldoc <- xmlParse(url) 
> rootNode <- xmlRoot(xmldoc)
> rootNode[1]
  1. Extract the XML data:
> data <- xmlSApply(rootNode,function(x) xmlSApply(x, xmlValue)) 
  1. Convert the extracted data into a data frame:
> cd.catalog <- data.frame(t(data),row.names=NULL) 
  1. Verify the results:
> cd.catalog[1:2,] 

How it works...

The xmlParse function returns an object of the XMLInternalDocument class, which is a C-level internal data structure.

The xmlRoot() function gets access to the root node and its elements. Let us check the first element of the root node:

> rootNode[1] 

$CD
<CD>
<TITLE>Empire Burlesque</TITLE>
<ARTIST>Bob Dylan</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>Columbia</COMPANY>
<PRICE>10.90</PRICE>
<YEAR>1985</YEAR>
</CD>
attr(,"class")
[1] "XMLInternalNodeList" "XMLNodeList"

To extract data from the root node, we use the xmlSApply() function iteratively over all the children of the root node. The xmlSApply function returns a matrix.

To convert the preceding matrix into a data frame, we transpose the matrix using the t() function and then extract the first two rows from the cd.catalog data frame:

> cd.catalog[1:2,] 
TITLE ARTIST COUNTRY COMPANY PRICE YEAR
1 Empire Burlesque Bob Dylan USA Columbia 10.90 1985
2 Hide your heart Bonnie Tyler UK CBS Records 9.90 1988

There's more...

XML data can be deeply nested and hence can become complex to extract. Knowledge of XPath is helpful to access specific XML tags. R provides several functions, such as xpathSApply and getNodeSet, to locate specific elements.

Extracting HTML table data from a web page

Though it is possible to treat HTML data as a specialized form of XML, R provides specific functions to extract data from HTML tables, as follows:

> url <- "WorldPopulation-wiki.htm" 
> tables <- readHTMLTable(url)
> world.pop <- tables[[6]]

The readHTMLTable() function parses the web page and returns a list of all the tables that are found on the page. For tables that have an id attribute, the function uses the id attribute as the name of that list element.

We are interested in extracting the "10 most populous countries", which is the fifth table, so we use tables[[6]].

Extracting a single HTML table from a web page

A single table can be extracted using the following command:

> table <- readHTMLTable(url,which=5) 

Specify which to get data from a specific table. R returns a data frame.

CONTINUE READING
83
Tech Concepts
36
Programming languages
73
Tech Tools
Icon Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.
Icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Icon 50+ new titles added per month and exclusive early access to books as they are being written.
R Data Analysis Cookbook, Second Edition
notes
bookmark Notes and Bookmarks search Search in title playlist Add to playlist download Download options font-size Font size

Change the font size

margin-width Margin width

Change margin width

day-mode Day/Sepia/Night Modes

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY

Submit Your Feedback

Modal Close icon
Modal Close icon
Modal Close icon