Book Image

Data Analysis with R

Book Image

Data Analysis with R

Overview of this book

Frequently the tool of choice for academics, R has spread deep into the private sector and can be found in the production pipelines at some of the most advanced and successful enterprises. The power and domain-specificity of R allows the user to express complex analytics easily, quickly, and succinctly. With over 7,000 user contributed packages, it’s easy to find support for the latest and greatest algorithms and techniques. Starting with the basics of R and statistical reasoning, Data Analysis with R dives into advanced predictive analytics, showing how to apply those techniques to real-world data though with real-world examples. Packed with engaging problems and exercises, this book begins with a review of R and its syntax. From there, get to grips with the fundamentals of applied statistics and build on this knowledge to perform sophisticated and powerful analytics. Solve the difficulties relating to performing data analysis in practice and find solutions to working with “messy data”, large data, communicating results, and facilitating reproducibility. This book is engineered to be an invaluable resource through many stages of anyone’s career as a data analyst.
Table of Contents (20 chapters)
Data Analysis with R
Credits
About the Author
About the Reviewer
www.PacktPub.com
Preface
Index

Simple linear regression


On to a substantially less trivial example, let's say No Scone Unturned has been keeping careful records of how many raisins (in grams) they have been using for their famous oatmeal raisin cookies. They want to construct a linear model describing the relationship between the area of a cookie (in centimeters squared) and how many raisins they use, on average.

In particular, they want to use linear regression to predict how many grams of raisins they will need for a 1-meter long oatmeal raisin cookie. Predicting a continuous variable (grams of raisins) from other variables sounds like a job for regression! In particular, when we use just a single predictor variable (the area of the cookies), the technique is called simple linear regression.

The left panel of Figure 8.2 illustrates the relationship between the area of cookies and the amount of raisins it used. It also shows the best-fit regression line:

Figure 8.2: (left) A scatterplot of areas and grams of raisins in...