Book Image

Data Analysis with R, Second Edition - Second Edition

Book Image

Data Analysis with R, Second Edition - Second Edition

Overview of this book

Frequently the tool of choice for academics, R has spread deep into the private sector and can be found in the production pipelines at some of the most advanced and successful enterprises. The power and domain-specificity of R allows the user to express complex analytics easily, quickly, and succinctly. Starting with the basics of R and statistical reasoning, this book dives into advanced predictive analytics, showing how to apply those techniques to real-world data though with real-world examples. Packed with engaging problems and exercises, this book begins with a review of R and its syntax with packages like Rcpp, ggplot2, and dplyr. From there, get to grips with the fundamentals of applied statistics and build on this knowledge to perform sophisticated and powerful analytics. Solve the difficulties relating to performing data analysis in practice and find solutions to working with messy data, large data, communicating results, and facilitating reproducibility. This book is engineered to be an invaluable resource through many stages of anyone’s career as a data analyst.
Table of Contents (24 chapters)
Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface
Index

Exercises


  • One of the great things about the bootstrap is how conceptually simple and flexible the procedure is. This makes it very easy to do our own research on it. In this exercise, we will be doing simulations of simulations. Specifically, to see for ourselves the deterioration of the reliability of bootstrap results as sample sizes get smaller, make samples of a normal distribution of a fixed mean, 30 or 50 times, with sample sizes of 100 to 5, going down by 5 each time. For each of these 30 to 50 times, perform the bootstrap procedure (with a sensible number of replications), and find out which proportion of the time the BCa confidence interval contains the mean we chose. Is it 95%, like we would expect? Repeat the procedure with other types of distributions. Does the reliability of the results differ?
  • Learn about the other approaches to the bootstrap that we mentioned in the last section. How does the smooth bootstrap solve the problem of the assumption of the non-existence of data...