RStudio for R Statistical Computing Cookbook

RStudio for R Statistical Computing Cookbook

By : Andrea Cirillo

Buy this Book

RStudio for R Statistical Computing Cookbook

By: Andrea Cirillo

Buy this Book

Overview of this book

The requirement of handling complex datasets, performing unprecedented statistical analysis, and providing real-time visualizations to businesses has concerned statisticians and analysts across the globe. RStudio is a useful and powerful tool for statistical analysis that harnesses the power of R for computational statistics, visualization, and data science, in an integrated development environment. This book is a collection of recipes that will help you learn and understand RStudio features so that you can effectively perform statistical analysis and reporting, code editing, and R development. The first few chapters will teach you how to set up your own data analysis project in RStudio, acquire data from different data sources, and manipulate and clean data for analysis and visualization purposes. You'll get hands-on with various data visualization methods using ggplot2, and you will create interactive and multidimensional visualizations with D3.js. Additional recipes will help you optimize your code; implement various statistical models to manage large datasets; perform text analysis and predictive analysis; and master time series analysis, machine learning, forecasting; and so on. In the final few chapters, you'll learn how to create reports from your analytical application with the full range of static and dynamic reporting tools that are available in RStudio so that you can effectively communicate results and even transform them into interactive web applications.

RStudio for R Statistical Computing Cookbook

Credits

About the Author

About the Reviewer

www.PacktPub.com

Preface

Free Chapter

Acquiring Data for Your Project

Introduction

Acquiring data from the Web – web scraping tasks

Accessing an API with R

Getting data from Twitter with the twitteR package

Getting data from Facebook with the Rfacebook package

Getting data from Google Analytics

Loading your data into R with rio packages

Converting file formats using the rio package

Preparing for Analysis – Data Cleansing and Manipulation

Introduction

Getting a sense of your data structure with R

Preparing your data for analysis with the tidyr package

Detecting and removing missing values

Substituting missing values using the mice package

Detecting and removing outliers

Performing data filtering activities

Basic Visualization Techniques

Introduction

Looking at your data using the plot() function

Using pairs.panel() to look at (visualize) correlations between variables

Adding text to a ggplot2 plot at a custom location

Changing axes appearance to ggplot2 plot (continous axes)

Producing a matrix of graphs with ggplot2

Drawing a route on a map with ggmap

Making use of the igraph package to draw a network

Showing communities in a network with the linkcomm package

Advanced and Interactive Visualization

Introduction

Producing a Sankey diagram with the networkD3 package

Creating a dynamic force network with the visNetwork package

Building a rotating 3D graph and exporting it as a GIF

Using the DiagrammeR package to produce a process flow diagram in RStudio

Power Programming with R

Introduction

Writing modular code in RStudio

Implementing parallel computation in R

Creating custom objects and methods in R using the S3 system

Evaluating your code performance using the profvis package

Comparing an alternative function's performance using the microbenchmarking package

Using GitHub with RStudio

Domain-specific Applications

Introduction

Dealing with regular expressions

Analyzing PDF reports in a folder with the tm package

Creating word clouds with the wordcloud package

Performing a Twitter sentiment analysis

Detecting fraud in e-commerce orders with Benford's law

Measuring customer retention using cohort analysis in R

Making a recommendation engine

Performing time series decomposition using the stl() function

Exploring time series forecasting with forecast()

Tracking stock movements using the quantmod package

Optimizing portfolio composition and maximising returns with the Portfolio Analytics package

Forecasting the stock market

Developing Static Reports

Introduction

Using one markup language for all types of documents – rmarkdown

Writing and styling PDF documents with RStudio

Writing wonderful tufte handouts with the tufte package and rmarkdown

Sharing your code and plots with slides

Curating a blog through RStudio

Dynamic Reporting and Web Application Development

Introduction

Generating dynamic parametrized reports with R Markdown

Developing a single-file Shiny app

Changing a Shiny app UI based on user input

Creating an interactive report with Shiny

Constructing RStudio add-ins

Sharing your work on RPubs

Deploying your app on Amazon AWS with ramazon

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Detecting and removing outliers

Outliers are usually dangerous values for data science activities, since they produce heavy distortions within models and algorithms.

Their detection and exclusion is, therefore, a really crucial task.

This recipe will show you how to easily perform this task.

We will compute the I and IV quartiles of a given population and detect values that far from these fixed limits.

You should note that this recipe is feasible only for univariate quantitative population, while different kind of data will require you to use other outlier-detection methods.

How to do it...

Compute the quantiles using the quantile() function:

quantiles <- quantile(tidy_gdp_complete$gdp, probs = c(.25, .75))

Compute the range value using the IQR() function:
```
range <- 1.5 * IQR(tidy_gdp_complete$gdp)
```

Subset the original data by excluding the outliers:

normal_gdp <- subset(tidy_gdp_complete,
tidy_gdp_complete$gdp > (quantiles[1] - range) & tidy_gdp_complete$gdp < (quantiles[2] ...

RStudio for R Statistical Computing Cookbook

By : Andrea Cirillo

RStudio for R Statistical Computing Cookbook

By: Andrea Cirillo

Overview of this book

Related Content you might be interested in

Current Title:

RStudio for R Statistical Computing Cookbook

Detecting and removing outliers

How to do it...