Book Image

RStudio for R Statistical Computing Cookbook

By : Andrea Cirillo
Book Image

RStudio for R Statistical Computing Cookbook

By: Andrea Cirillo

Overview of this book

The requirement of handling complex datasets, performing unprecedented statistical analysis, and providing real-time visualizations to businesses has concerned statisticians and analysts across the globe. RStudio is a useful and powerful tool for statistical analysis that harnesses the power of R for computational statistics, visualization, and data science, in an integrated development environment. This book is a collection of recipes that will help you learn and understand RStudio features so that you can effectively perform statistical analysis and reporting, code editing, and R development. The first few chapters will teach you how to set up your own data analysis project in RStudio, acquire data from different data sources, and manipulate and clean data for analysis and visualization purposes. You'll get hands-on with various data visualization methods using ggplot2, and you will create interactive and multidimensional visualizations with D3.js. Additional recipes will help you optimize your code; implement various statistical models to manage large datasets; perform text analysis and predictive analysis; and master time series analysis, machine learning, forecasting; and so on. In the final few chapters, you'll learn how to create reports from your analytical application with the full range of static and dynamic reporting tools that are available in RStudio so that you can effectively communicate results and even transform them into interactive web applications.
Table of Contents (15 chapters)
RStudio for R Statistical Computing Cookbook
Credits
About the Author
About the Reviewer
www.PacktPub.com
Preface
Index

Converting file formats using the rio package


As we saw in the previous recipe, Rio is an R package developed by Thomas J. Leeper which makes the import and export of data really easy. You can refer to the previous recipe for more on its core functionalities and logic.

Besides the import() and export() functions, Rio also offers a really well-conceived and straightforward file conversion facility through the convert() function, which we are going to leverage in this recipe.

Getting ready

First of all, we need to install and make the rio package available by running the following code:

install.packages("rio")
library(rio)

In the following example, we are going to import the world_gdp_data dataset from a local .csv file. This dataset is provided within the RStudio project related to this book, in the data folder.

You can download it by authenticating your account at http://packtpub.com.

How to do it...

  1. The first step is to convert the file from the .csv format to the .json format:

    convert("world_gdp_data.csv", "world_gdp_data.json")

    This will create a new file without removing the original one.

  2. The next step is to remove the original file:

    file.remove("world_gdp_data.csv")

There's more...

As fully illustrated within the Rio vignette (which you can find at https://cran.r-project.org/web/packages/rio/vignettes/rio.html), the following formats are supported for import and export:

Format

Import

Export

Tab-separated data (.tsv)

Yes

Yes

Comma-separated data (.csv)

Yes

Yes

CSVY (CSV + YAML metadata header) (.csvy)

Yes

Yes

Pipe-separated data (.psv)

Yes

Yes

Fixed-width format data (.fwf)

Yes

Yes

Serialized R objects (.rds)

Yes

Yes

Saved R objects (.RData)

Yes

Yes

JSON (.json)

Yes

Yes

YAML (.yml)

Yes

Yes

Stata (.dta)

Yes

Yes

SPSS and SPSS portable

Yes (.sav and .por)

Yes (.sav only)

XBASE database files (.dbf)

Yes

Yes

Excel (.xls)

Yes

 

Excel (.xlsx)

Yes

Yes

Weka Attribute-Relation File Format (.arff)

Yes

Yes

R syntax (.r)

Yes

Yes

Shallow XML documents (.xml)

Yes

Yes

SAS (.sas7bdat)

Yes

 

SAS XPORT (.xpt)

Yes

 

Minitab (.mtp)

Yes

 

Epiinfo (.rec)

Yes

 

Systat (.syd)

Yes

 

Data Interchange Format (.dif)

Yes

 

OpenDocument Spreadsheet (.ods)

Yes

 

Fortran data (no recognized extension)

Yes

 

Google Sheets

Yes

 

Clipboard (default is .tsv)

  

Since rio is still a growing package, I strongly suggest that you follow its development on its GitHub repository, where you will easily find out when new formats are added, at https://github.com/leeper/rio.