Book Image

Data Analysis with R, Second Edition - Second Edition

Book Image

Data Analysis with R, Second Edition - Second Edition

Overview of this book

Frequently the tool of choice for academics, R has spread deep into the private sector and can be found in the production pipelines at some of the most advanced and successful enterprises. The power and domain-specificity of R allows the user to express complex analytics easily, quickly, and succinctly. Starting with the basics of R and statistical reasoning, this book dives into advanced predictive analytics, showing how to apply those techniques to real-world data though with real-world examples. Packed with engaging problems and exercises, this book begins with a review of R and its syntax with packages like Rcpp, ggplot2, and dplyr. From there, get to grips with the fundamentals of applied statistics and build on this knowledge to perform sophisticated and powerful analytics. Solve the difficulties relating to performing data analysis in practice and find solutions to working with messy data, large data, communicating results, and facilitating reproducibility. This book is engineered to be an invaluable resource through many stages of anyone’s career as a data analyst.
Table of Contents (24 chapters)
Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface
Index

Simple linear regression


Onto a substantially less trivial example; let's say No Scone Unturned has been keeping careful records of how many raisins (in grams) they have been using for their famous oatmeal raisin cookies. They want to construct a linear model describing the relationship between the area of a cookie (in centimeters squared) and how many raisins they use, on average.

In particular, they want to use linear regression to predict how many grams of raisins they will need for a 1-meter long oatmeal raisin cookie. Predicting a continuous variable (grams of raisins) from other variables sounds like a job for regression! In particular, when we use just a single predictor variable (the area of the cookies), the technique is called simple linear regression.

The left panel of Figure 9.2 illustrates the relationship between the area of cookies and the amount of raisins it used. It also shows the best-fit regression line:

Figure 9.2: A scatterplot of areas and grams of raisins in No Scone...