Sign In Start Free Trial

Book Overview & Buying
Table Of Contents

Data Analysis with R, Second Edition - Second Edition

3.5 (2)

Data Analysis with R, Second Edition

3.5 (2)

Overview of this book

Frequently the tool of choice for academics, R has spread deep into the private sector and can be found in the production pipelines at some of the most advanced and successful enterprises. The power and domain-specificity of R allows the user to express complex analytics easily, quickly, and succinctly. Starting with the basics of R and statistical reasoning, this book dives into advanced predictive analytics, showing how to apply those techniques to real-world data though with real-world examples. Packed with engaging problems and exercises, this book begins with a review of R and its syntax with packages like Rcpp, ggplot2, and dplyr. From there, get to grips with the fundamentals of applied statistics and build on this knowledge to perform sophisticated and powerful analytics. Solve the difficulties relating to performing data analysis in practice and find solutions to working with messy data, large data, communicating results, and facilitating reproducibility. This book is engineered to be an invaluable resource through many stages of anyone’s career as a data analyst.

Preface

Preface

Who this book is for

What this book covers

To get the most out of this book

Get in touch

Free Chapter

RefresheR

RefresheR

Navigating the basics

Getting help in R

Vectors

Functions

Matrices

Loading data into R

Working with packages

Exercises

Summary

The Shape of Data

The Shape of Data

Univariate data

Frequency distributions

Central tendency

Spread

Populations, samples, and estimation

Probability distributions

Visualization methods

Exercises

Summary

Describing Relationships

Describing Relationships

Multivariate data

Relationships between a categorical and continuous variable

Relationships between two categorical variables

The relationship between two continuous variables

Visualization methods

Exercises

Summary

Probability

Probability

Basic probability

A tale of two interpretations

Sampling from distributions

The normal distribution

Exercises

Summary

Using Data To Reason About The World

Using Data To Reason About The World

Estimating means

The sampling distribution

Interval estimation

Smaller samples

Exercises

Summary

Testing Hypotheses

Testing Hypotheses

The null hypothesis significance testing framework

Testing the mean of one sample

Testing two means

Testing more than two means

Testing independence of proportions

What if my assumptions are unfounded?

Exercises

Summary

Bayesian Methods

Bayesian Methods

The big idea behind Bayesian analysis

Choosing a prior

Who cares about coin flips

Enter MCMC – stage left

Using JAGS and runjags

Fitting distributions the Bayesian way

The Bayesian independent samples t-test

Exercises

Summary

The Bootstrap

The Bootstrap

What's... uhhh... the deal with the bootstrap?

Performing the bootstrap in R (more elegantly)

Confidence intervals

A one-sample test of means

Bootstrapping statistics other than the mean

Busting bootstrap myths

Exercises

Summary

Predicting Continuous Variables

Predicting Continuous Variables

Linear models

Simple linear regression

Simple linear regression with a binary predictor

Multiple regression

Regression with a non-binary predictor

Kitchen sink regression

The bias-variance trade-off

Linear regression diagnostics

Advanced topics

Exercises

Summary

Predicting Categorical Variables

Predicting Categorical Variables

k-Nearest neighbors

Logistic regression

Decision trees

Random forests

Choosing a classifier

Exercises

Summary

Predicting Changes with Time

Predicting Changes with Time

What is a time series?

What is forecasting?

Creating and plotting time series

Components of time series

Time series decomposition

White noise

Autocorrelation

Smoothing

ETS and the state space model

Interventions for improvement

What we didn't cover

Citations for the climate change data

Exercises

Summary

Sources of Data

Sources of Data

Relational databases

Using JSON

XML

Other data formats

Online repositories

Exercises

Summary

Dealing with Missing Data

Dealing with Missing Data

Analysis with missing data

Visualizing missing data

Types of missing data

Unsophisticated methods for dealing with missing data

So how does mice come up with the imputed values?

Exercises

Summary

Dealing with Messy Data

Dealing with Messy Data

Checking unsanitized data

Regular expressions

Other tools for messy data

Exercises

Summary

Dealing with Large Data

Dealing with Large Data

Wait to optimize

Using a bigger and faster machine

Be smart about your code

Using optimized packages

Using another R implementation

Using parallelization

Using Rcpp

Being smarter about your code

Exercises

Summary

Working with Popular R Packages

Working with Popular R Packages

The data.table package

Using dplyr and tidyr to manipulate data

Functional programming as a main tidyverse principle

Reshaping data with tidyr

Exercises

Summary

Reproducibility and Best Practices

Reproducibility and Best Practices

R scripting

R projects

Version control

Communicating results

Exercises

Summary

Other Books You May Enjoy

Other Books You May Enjoy

Leave a review - let other readers know what you think

RefresheR

Before we dive into the (other) fun stuff (sampling multi-dimensional probability distributions, using convex optimization to fit data models, and so on), it would be helpful if we review those aspects of R that all subsequent chapters will assume knowledge of.

If you fancy yourself an R guru, you should still, at least, skim through this chapter, because you'll almost certainly find the idioms, packages, and style introduced here to be beneficial for following the rest of the material.

If you don't care much about R (yet), and are just in this for the statistics, you can heave a heavy sigh of relief that, for the most part, you can run the code given in this book in the interactive R interpreter with very little modification and just follow along with the ideas. However, it is my belief (read: delusion) that by the end of this book, you'll cultivate a newfound appreciation for R alongside a robust understanding of methods in data analysis.

Fire up your R interpreter and let's get started!

CONTINUE READING

83

Tech Concepts

36

Programming languages

73

Tech Tools

Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

50+ new titles added per month and exclusive early access to books as they are being written.

Data Analysis with R, Second Edition

Search

Your notes and bookmarks