Book Image

Data Analysis with R, Second Edition - Second Edition

Book Image

Data Analysis with R, Second Edition - Second Edition

Overview of this book

Frequently the tool of choice for academics, R has spread deep into the private sector and can be found in the production pipelines at some of the most advanced and successful enterprises. The power and domain-specificity of R allows the user to express complex analytics easily, quickly, and succinctly. Starting with the basics of R and statistical reasoning, this book dives into advanced predictive analytics, showing how to apply those techniques to real-world data though with real-world examples. Packed with engaging problems and exercises, this book begins with a review of R and its syntax with packages like Rcpp, ggplot2, and dplyr. From there, get to grips with the fundamentals of applied statistics and build on this knowledge to perform sophisticated and powerful analytics. Solve the difficulties relating to performing data analysis in practice and find solutions to working with messy data, large data, communicating results, and facilitating reproducibility. This book is engineered to be an invaluable resource through many stages of anyone’s career as a data analyst.
Table of Contents (24 chapters)
Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface
Index

The relationship between two continuous variables


Do you think that there is a relationship between women's heights and their weights? If you said yes, congratulations, you're right!

We can verify this assertion using the data in R's built-in dataset, women, which holds the height and weight of 15 American women from ages 30 to 39:

 head(women) 
    height weight 
  1     58    115 
  2     59    117 
  3     60    120 
  4     61    123 
  5     62    126 
  6     63    129 
 nrow(women) 
  [1] 15

Specifically, this relationship is referred to as a positive relationship, because as one of the variable increases, we expect an increase in the other variable.

The most typical visual representation of the relationship between two continuous variables is a scatterplot.

A scatterplot is displayed as a group of points whose position along the x axis is established by one variable, and the position along the y axis is established by the other. When there is a positive relationship, the dots, for the...