Book Image

Data Analysis with R, Second Edition - Second Edition

Book Image

Data Analysis with R, Second Edition - Second Edition

Overview of this book

Frequently the tool of choice for academics, R has spread deep into the private sector and can be found in the production pipelines at some of the most advanced and successful enterprises. The power and domain-specificity of R allows the user to express complex analytics easily, quickly, and succinctly. Starting with the basics of R and statistical reasoning, this book dives into advanced predictive analytics, showing how to apply those techniques to real-world data though with real-world examples. Packed with engaging problems and exercises, this book begins with a review of R and its syntax with packages like Rcpp, ggplot2, and dplyr. From there, get to grips with the fundamentals of applied statistics and build on this knowledge to perform sophisticated and powerful analytics. Solve the difficulties relating to performing data analysis in practice and find solutions to working with messy data, large data, communicating results, and facilitating reproducibility. This book is engineered to be an invaluable resource through many stages of anyone’s career as a data analyst.
Table of Contents (24 chapters)
Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface
Index

Exercises


Practice the following exercises to revise the concept of reproducibility learned in this chapter:

  • Review: When we created the data frame from nothing, we combined a vector of 1,000 binomially distributed random variables, 1,000 normally distributed random variables, and a vector of two colors, red and white. Since all the columns in a data frame have to be the same length, how did R allow this? What is the property of vectors that allows this?
  • Seek out, read, and attempt to understand the source code of some of your favorite R packages. What version control system is the author of the package using?
  • Carefully review the analysis that was used as an example in this chapter. In what manner can this analysis be improved upon? Look at the distribution of the combined SAT scores in NYC schools. Why was modeling the SAT scores with a Gaussian likelihood function a very bad choice? What could we have done instead?
  • If both a poor and a rich person are willing to buy a pair of sneakers for...