Book Image

Data Analysis with R, Second Edition - Second Edition

Book Image

Data Analysis with R, Second Edition - Second Edition

Overview of this book

Frequently the tool of choice for academics, R has spread deep into the private sector and can be found in the production pipelines at some of the most advanced and successful enterprises. The power and domain-specificity of R allows the user to express complex analytics easily, quickly, and succinctly. Starting with the basics of R and statistical reasoning, this book dives into advanced predictive analytics, showing how to apply those techniques to real-world data though with real-world examples. Packed with engaging problems and exercises, this book begins with a review of R and its syntax with packages like Rcpp, ggplot2, and dplyr. From there, get to grips with the fundamentals of applied statistics and build on this knowledge to perform sophisticated and powerful analytics. Solve the difficulties relating to performing data analysis in practice and find solutions to working with messy data, large data, communicating results, and facilitating reproducibility. This book is engineered to be an invaluable resource through many stages of anyone’s career as a data analyst.
Table of Contents (24 chapters)
Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface
Index

Be smart about your code


In many cases, the performance of the R code can be greatly improved by simple restructuring of the code; this doesn't change the output of the program, just the way it is represented. Restructurings of this type are often referred to as code refactoring. The refactorings that really make a difference performance-wise usually have to do with either improved allocation of memory or vectorization.

Allocation of memory

Refer all the way back to Chapter 5, Using Data to Reason About the World. Remember when we created a mock population of women's heights in the US, and we repeatedly took 10,000 samples of 40 from it to demonstrate the sampling distribution of the sample means? In a code comment, I mentioned in passing that the snippet numeric(10000) created an empty vector of 10,000 elements, but I never explained why we did that. Why didn't we just create a vector of 1, and continually tack on each new sample mean to the end of it? This is demonstrated as follows:

set...