Book Image

Hands-On Exploratory Data Analysis with R

By : Radhika Datar, Harish Garg
Book Image

Hands-On Exploratory Data Analysis with R

By: Radhika Datar, Harish Garg

Overview of this book

Hands-On Exploratory Data Analysis with R will help you build a strong foundation in data analysis and get well-versed with elementary ways to analyze data. You will learn how to understand your data and summarize its characteristics. You'll also study the structure of your data, and you'll explore graphical and numerical techniques using the R language. This book covers the entire exploratory data analysis (EDA) process—data collection, generating statistics, distribution, and invalidating the hypothesis. As you progress through the book, you will set up a data analysis environment with tools such as ggplot2, knitr, and R Markdown, using DOE Scatter Plot and SML2010 for multifactor, optimization, and regression data problems. By the end of this book, you will be able to successfully carry out a preliminary investigation on any dataset, uncover hidden insights, and present your results in a business context.
Table of Contents (17 chapters)
Free Chapter
Section 1: Setting Up Data Analysis Environment
Section 2: Univariate, Time Series, and Multivariate Data
Section 3: Multifactor, Optimization, and Regression Data Problems
Section 4: Conclusions

What this book covers

Chapter 1, Setting Up Our Data Analysis Environment, introduces the overall goal of this book. This chapter stipulates how exploratory data analysis benefits business and has a significant impact across almost all verticals.

Chapter 2, Importing Diverse Datasets, demonstrates practical, hands-on code examples on reading in all kinds of data into R for exploratory data analysis. This chapter also covers how to use advanced options while importing datasets such as delimited data, Excel data, JSON data, and data from web APIs.

Chapter 3, Examining, Cleaning, and Filtering, introduces how to identify and clean missing and erroneous data formats. This chapter also covers concepts such as data manipulation, wrangling, and reshaping.

Chapter 4, Visualizing Data Graphically with ggplot2, demonstrates how to draw different kinds of plots and charts, including scatter plots, histograms, probability plots, residual plots, boxplots, and block plots.

Chapter 5, Creating Aesthetically Pleasing Reports with knitr and R Markdown, explains how to use RStudio to wrap your code, graphics, plots, and findings in a complete and informative data analysis report. The chapter will also look at how to publish these in different formats for different audiences using R Markdown and packages such as knitr.

Chapter 6, Univariate and Control Datasets, takes a real-world univariate and control dataset and runs an entire exploratory data analysis workflow on it using the R packages and techniques.

Chapter 7, Time Series Datasets, introduces a time series dataset and describes how to use exploratory data analysis techniques to analyze this data.

Chapter 8, Multivariate Datasets, introduces a dataset from the multivariate problem category. This chapter explains how to use exploratory data analysis techniques to analyze this data, as well as how to use the exploratory data analysis techniques of the star plot, the scatter plot matrix, the conditioning plot, and their principal components.

Chapter 9, Multi-Factor Datasets, introduces a multi-factor dataset and explains how to use exploratory data analysis techniques to analyze this data.

Chapter 10, Handling Optimization and Regression Data Problems, introduces a dataset from the regression problem category and describes how to use exploratory data analysis techniques to analyze this data. It also shows how to learn and apply these exploratory data analysis techniques.

Chapter 11, Next Steps, covers how to build a roadmap for yourself to consolidate the skills you have learned in this book and gain further expertise in the field of data science with R.