The first project we will introduce in this book is an analysis of automobile fuel economy data. The primary tool that we will use to analyze this dataset is the R statistical programming language. R is often referred to as the lingua franca of data science, as it is currently the most popular language for statistics and data analysis. As you'll see from the examples in the first half of this book, R is an excellent tool for data manipulation, analysis, modeling, visualization, and creating useful scripts to get analytical tasks done.
The recipes in this chapter will roughly follow these five steps in the data science pipeline:
Acquisition
Exploration and understanding
Munging, wrangling, and manipulation
Analysis and modeling
Communication and operationalization
Process-wise, the backbone of data science is the data science pipeline, and in order to get good at data science, you need to gain experience going through this process while swapping various tools and methods along the way...