Book Image

Data Analysis with R, Second Edition - Second Edition

Book Image

Data Analysis with R, Second Edition - Second Edition

Overview of this book

Frequently the tool of choice for academics, R has spread deep into the private sector and can be found in the production pipelines at some of the most advanced and successful enterprises. The power and domain-specificity of R allows the user to express complex analytics easily, quickly, and succinctly. Starting with the basics of R and statistical reasoning, this book dives into advanced predictive analytics, showing how to apply those techniques to real-world data though with real-world examples. Packed with engaging problems and exercises, this book begins with a review of R and its syntax with packages like Rcpp, ggplot2, and dplyr. From there, get to grips with the fundamentals of applied statistics and build on this knowledge to perform sophisticated and powerful analytics. Solve the difficulties relating to performing data analysis in practice and find solutions to working with messy data, large data, communicating results, and facilitating reproducibility. This book is engineered to be an invaluable resource through many stages of anyone’s career as a data analyst.
Table of Contents (24 chapters)
Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface
Index

Busting bootstrap myths


There are two very prevalent myths regarding the bootstrap that we will briefly address in this section.

The first is that the bootstrap is a panacea for small sample sizes. I think at least part of this myth is due to the name the bootstrap, which conjures of images of some rugged person pulling themselves up by the bootstraps and making something from nothing. Unfortunately, the bootstrap does not make something from nothing, nor does it even make more out of less. The important thing to remember is that the accuracy of your bootstrap distribution is completely dependent on the representativeness of your original sample. Refer back to Figure 8.1. Notice that, although the bootstrap distribution and the sampling distribution of sample means have the same shape, the bootstrap distribution was shifted slightly to the left because, by chance, the sample we got had a mean slightly less than the population mean. This will happen. And, of course, the smaller the sample...