Book Image

Hands-On Exploratory Data Analysis with R

By : Radhika Datar, Harish Garg
Book Image

Hands-On Exploratory Data Analysis with R

By: Radhika Datar, Harish Garg

Overview of this book

Hands-On Exploratory Data Analysis with R will help you build a strong foundation in data analysis and get well-versed with elementary ways to analyze data. You will learn how to understand your data and summarize its characteristics. You'll also study the structure of your data, and you'll explore graphical and numerical techniques using the R language. This book covers the entire exploratory data analysis (EDA) process—data collection, generating statistics, distribution, and invalidating the hypothesis. As you progress through the book, you will set up a data analysis environment with tools such as ggplot2, knitr, and R Markdown, using DOE Scatter Plot and SML2010 for multifactor, optimization, and regression data problems. By the end of this book, you will be able to successfully carry out a preliminary investigation on any dataset, uncover hidden insights, and present your results in a business context.
Table of Contents (17 chapters)
Free Chapter
1
Section 1: Setting Up Data Analysis Environment
7
Section 2: Univariate, Time Series, and Multivariate Data
11
Section 3: Multifactor, Optimization, and Regression Data Problems
14
Section 4: Conclusions

Univariate and Control Datasets

In this chapter, we will take a real-world univariate and control dataset and run a complete exploratory data analysis workflow on it using the R packages and techniques we covered in Chapter 1, Setting Up Our Data Analysis Environment. After reading and tidying up the data, we will use EDA techniques to map and understand the underlying structure of the data. We will then identify the most important variables in the dataset, test our assumptions to estimate the parameters, and establish the margins of error. We will then explore the dataset graphically using four plots and probability plots. And finally, we will summarize our results in a data report. The code examples will be used from the Bank and Marketing data from UCI.

The following topics will be covered in this chapter:

  • Introducing and reading the data
  • Cleaning and tidying up the data
  • Mapping...