Mastering RStudio: Develop, Communicate, and Collaborate with R

Mastering RStudio: Develop, Communicate, and Collaborate with R

4 (1)

Buy this Book

Mastering RStudio: Develop, Communicate, and Collaborate with R

4 (1)

Buy this Book

Overview of this book

RStudio helps you to manage small to large projects by giving you a multi-functional integrated development environment, combined with the power and flexibility of the R programming language, which is becoming the bridge language of data science for developers and analyst worldwide. Mastering the use of RStudio will help you to solve real-world data problems. This book begins by guiding you through the installation of RStudio and explaining the user interface step by step. From there, the next logical step is to use this knowledge to improve your data analysis workflow. We will do this by building up our toolbox to create interactive reports and graphs or even web applications with Shiny. To collaborate with others, we will explore how to use Git and GitHub and how to build your own packages to ensure top quality results. Finally, we put it all together in an interactive dashboard written with R.

Mastering RStudio – Develop, Communicate, and Collaborate with R

Credits

About the Authors

About the Reviewer

www.PacktPub.com

Preface

Free Chapter

The RStudio IDE – an Overview

Downloading and installing RStudio

Working with RStudio and projects

Summary

Communicating Your Work with R Markdown

The concept of reproducible research

Doing reproducible research with R Markdown

Getting started with R Markdown in RStudio

The R Markdown interface

Advanced R Markdown documents

Summary

R Lesson I – Graphics System

The graphic system in R

The R graphics package—base

Introducing the lattice package

Introducing ggplot2

Applying The Grammar of Graphics with ggplot2

Using ggplot2

Customizing your charts

Interactive plotting systems

Summary

Shiny – a Web-app Framework for R

Introducing Shiny – the app framework

The concept of reactivity

Discovering the scope of the Shiny user interface

Sharing your Shiny application with others

Diving into the Shiny ecosystem

Summary

Interactive Documents with R Markdown

Creating interactive documents with R Markdown

Using R Markdown and Shiny

Embedding interactive charts into R Markdown

Publishing interactive R Markdown documents

Summary

Creating Professional Dashboards with R and Shiny

Explaining the concept of dashboards

Introducing the shinydashboard package

Showing the elements of shinydashboard

Building your own KPI dashboard

Summary

Package Development in RStudio

Understanding R packages

Installing devtools

Building packages with RStudio

Using Packrat with a project

Writing the documentation for a package

Understanding the namespaces of a package

Building and checking a package

Using roxygen2 for package documentation

Testing a package

Adding a dataset to a package

Writing a package vignette with R markdown

References for further information

Summary

Collaborating with Git and GitHub

Introducing version control

Using Git with RStudio

Using branches

Further resources

Summary

R for your Organization – Managing the RStudio Server

Managing the RStudio Server

Setting up R, RStudio, and the Shiny Server

Summary

Extending RStudio and Your Knowledge of R

Extending RStudio, finding answers, and more

RStudio environment customizations

R help is on the way

Improving your R knowledge

Staying up to date in the R world

Summary

Index

Customer Reviews

4 (1)

5 star

4 star

100%

3 star

2 star

1 star

Working with RStudio and projects

In the times before RStudio, it was very hard to manage bigger projects with R in the R console, as you had to create all the folder structures on your own.

When you work with projects or open a project, RStudio will instantly take several actions. For example, it will start a new and clean R session, it will source the .Rprofile file in the project's main directory, and it will set the current working directory to the project directory. So, you have a complete working environment individually for every project. RStudio will even adjust its own settings, such as active tabs, splitter positions, and so on, to where they were when the project was closed.

But just because you can create projects with RStudio easily, it does not mean that you should create a project for every single time that you write R code. For example, if you just want to do a small analysis, we would recommend that you create a project where you save all your smaller scripts.

Creating a project with RStudio

RStudio offers you an easy way to create projects. Just navigate to File | New Project and you will see a popup window with the following options:

New Directory
Existing Directory
Version Control

These options let you decide from where you want to create your project. So, if you want to start it from scratch and create a new directory, associate your new project to an existing one, or if you want to create a project from a version control repository, you can avail of the respective options. For now, we will focus on creating a new directory.

The following list will show you the next options available:

Empty Project
R Package
Shiny Web Application

We will look in the categories, R Package and Shiny Web Application later in this book, so for now we will concentrate on the Empty Project option.

Locating your project

A very important question you have to ask yourself when creating a new project is where you want to save it? There are several options and details you have to pay attention to especially when it comes to collaboration and different people working on the same project.

You can save your project locally, on a cloud storage or with the help of a revision control system such as Git.

Using RStudio with Dropbox

An easy way to store your project and to be able to access it from everywhere is the use of a cloud storage provider like Dropbox. It offers you a free account with 2 GB of storage, which should be enough for your first project.

Preventing Dropbox synchronization conflicts

RStudio actively monitors your project files for changes, which allows it to index functions and files to enable code completion and navigation. But when you use Dropbox at the same time to remotely sync your work, it will also monitor your files and this can cause conflicts. So you should tell Dropbox to ignore the .Rproj.user directory in your RStudio project.

To ignore a file in Dropbox, navigate to Preferences | Account | Selective Sync and uncheck the .Rproj.user directory.

Dropbox also helps you with version control, as it keeps previous versions of a file.

Creating your first project

To begin your first project, choose the New Directory option we described before and create an empty project. Then, choose a name for the directory and the location that you want to save it in. You should create a projects folder on your Dropbox.

The first project will be a small data analysis based on a dataset that was extracted from the 1974 issue of the Motor Trend US magazine. It comprises fuel consumption and ten aspects of automobile design and performance, such as the weight or number of cylinders for 32 automobiles, and is included in the base R package. So, we do not have to install a separate package to work with this dataset, as it is automatically loaded when you start R.

As you can see, we left the Use packrat with this project option unchecked. Packrat is a dependency management tool that makes your R code more isolated, portable, and reproducible by giving your project its own privately managed package library. This is especially important when you want to create projects in an organizational context where the code has to run on various computer systems, and has to be usable for a lot of different users. This first project will just run locally and will not focus on a specific combination of package versions.

Organizing your folders

RStudio creates an empty directory for you that includes just the file, Motor-Car-Trend-Analysis.Rproj. This file will store all the information on your project that RStudio will need for loading. But to stay organized, we have to create some folders in the directory. Create the following folders:

data: This includes all the data that we need for our analysis
code: This includes all the code files for cleaning up data, generating plots, and so on
plots: This includes all graphical outputs
reports: This comprises all the reports that we create from our dataset

This is a very basic folder structure and you have to adapt it to your needs in your own projects. You could, for example, add the folders, raw and processed, in the data folder. Raw for unstructured data that you started with, and processed for cleaned data that you actually used for your analysis.

Saving the data

The Motor Trend Car Road Tests dataset is part of the dataset package, which is one of the preinstalled packages in R. But, we will save the data in a CSV file in our data folder, after extracting the data from the mtcars variable, to make sure our analysis is reproducible:

#write data into csv file
write.csv(mtcars, file = "data/cars.csv", row.names=FALSE)

Put the previous line of code in a new R script and save it as data.R in the code folder.

Analyzing the data

The analysis script will first have to load the data from the CSV file with the following line:

cars_data <- read.csv(file = "data/cars.csv", header = TRUE, sep = ",")

Correcting the path for report exporting

If you want to create a report from your R script, you have to specify the relative path to the data file, beginning with two dots:

cars_data <- read.csv(file = "../data/cars.csv", header = TRUE, sep = ",")

Next, we can take a look at the different variables and see if we can find any correlations on the first look. We can create a pairs matrix with the following line:

pairs(cars_data)

We can then save the created matrix with the export function of the Plots Pane option. Then, we can save it as an image in the plots folder:

As you can see, we can expect a lot of different variable combinations, which could correlate very well. The most obvious one is surely weight of the car (wt) and Miles per Gallon (mpg): a heavy car seems to need more gallons of fuel than a lighter car.

We can now test this hypothesis by calculating the correlation and plotting a scatterplot of these two variables. In addition, we can also do a linear regression and see how it performs:

cor(cars_data$wt, cars_data$mpg)


install.packages("ggplot2")
require(ggplot2)

ggplot(cars_data, aes(x=wt, y=mpg))+
  geom_point(aes(shape=factor(am, labels = c("Manual","Automatic"))))+
  geom_smooth(method=lm)+scale_shape_discrete(name = "Transmission Type")

firstModel <- lm(mpg~wt, data = cars_data)

We can see more details with:

summary(firstModel)$coef

[1] -0.8676594

print(c('R-squared', round(summary(firstModel)$r.sq,2)))

[1] "R-squared"  "0.75"

We can see that there is a high negative correlation between these two variables, and the first model is a pretty good fit with an R-squared value of 0.75.

But we also have to test other combinations and see how they perform. And what we basically do is test all the correlations and use the best model.

We will not explain the statistical functions behind this approach, as it would be out of the scope of this chapter:

#Test other correlations
completeModel <- lm(mpg ~., data=cars_data)
stepSolution <- step(completeModel, direction = "backward")

#get the best model
bestModel <- stepSolution$call
bestModel

The output will look like this:

The best model now has the following formula:

mpg ~ wt + qsec + am

So, we will create a final model with this formula and see how it performs:

finalModel <- lm(mpg~wt + factor(am) + qsec, data = cars_data)
summary(finalModel)$coef
print(c('R-squared', round(summary(finalModel)$r.sq,2)))

 [1] "R-squared"  "0.85"

As we can see, the final model also includes the variable, qsec, which is the time the car needs for a quarter mile, and am, which is the type of transmission (automatic or manual).

But, we can also see that just the transmission type, manual, seems to play a significant role when it comes to mileage.

After you execute the analysis script, you can see that all your results are still in RStudio, which is a big advantage in contrast to the R console.

So, you can go through all the graphs you produced in the plot viewer with the arrows.

Or, you can see which variables are set in the environment. These are all the models you calculated in this analysis, as well as in your initial dataset.

You can click on the table icon behind cars_data in the Environment pane to open the data frame in the Source pane.

Exporting your analysis as a report

You can also export the analysis.R script as a report in the HTML, PDF, or MS Word format, and you will then find the report in your code folder. Therefore, just click on the Publish button and RStudio will guide you through the process.

Mastering RStudio: Develop, Communicate, and Collaborate with R

Mastering RStudio: Develop, Communicate, and Collaborate with R

Overview of this book

Related Content you might be interested in

Current Title:

Mastering RStudio: Develop, Communicate, and Collaborate with R

Working with RStudio and projects

Creating a project with RStudio

Locating your project

Using RStudio with Dropbox

Preventing Dropbox synchronization conflicts

Creating your first project

Organizing your folders

Saving the data

Analyzing the data

Correcting the path for report exporting

Exporting your analysis as a report