- One of the great things about the bootstrap is how conceptually simple and flexible the procedure is. This makes it very easy to do our own research on it. In this exercise, we will be doing simulations of simulations. Specifically, to see for ourselves the deterioration of the reliability of bootstrap results as sample sizes get smaller, make samples of a normal distribution of a fixed mean, 30 or 50 times, with sample sizes of 100 to 5, going down by 5 each time. For each of these 30 to 50 times, perform the bootstrap procedure (with a sensible number of replications), and find out which proportion of the time the BCa confidence interval contains the mean we chose. Is it 95%, like we would expect? Repeat the procedure with other types of distributions. Does the reliability of the results differ?
- Learn about the other approaches to the bootstrap that we mentioned in the last section. How does the smooth bootstrap solve the problem of the assumption of the non-existence of data...
Data Analysis with R, Second Edition - Second Edition
Data Analysis with R, Second Edition - Second Edition
Overview of this book
Frequently the tool of choice for academics, R has spread deep into the private sector and can be found in the production pipelines at some of the most advanced and successful enterprises. The power and domain-specificity of R allows the user to express complex analytics easily, quickly, and succinctly.
Starting with the basics of R and statistical reasoning, this book dives into advanced predictive analytics, showing how to apply those techniques to real-world data though with real-world examples.
Packed with engaging problems and exercises, this book begins with a review of R and its syntax with packages like Rcpp, ggplot2, and dplyr. From there, get to grips with the fundamentals of applied statistics and build on this knowledge to perform sophisticated and powerful analytics. Solve the difficulties relating to performing data analysis in practice and find solutions to working with messy data, large data, communicating results, and facilitating reproducibility.
This book is engineered to be an invaluable resource through many stages of anyone’s career as a data analyst.
Table of Contents (24 chapters)
Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface
Free Chapter
RefresheR
The Shape of Data
Describing Relationships
Probability
Using Data To Reason About The World
Testing Hypotheses
Bayesian Methods
The Bootstrap
Predicting Continuous Variables
Predicting Categorical Variables
Predicting Changes with Time
Sources of Data
Dealing with Missing Data
Dealing with Messy Data
Dealing with Large Data
Working with Popular R Packages
Reproducibility and Best Practices
Other Books You May Enjoy
Index
Customer Reviews