Book Image

Data Analysis with R, Second Edition - Second Edition

Book Image

Data Analysis with R, Second Edition - Second Edition

Overview of this book

Frequently the tool of choice for academics, R has spread deep into the private sector and can be found in the production pipelines at some of the most advanced and successful enterprises. The power and domain-specificity of R allows the user to express complex analytics easily, quickly, and succinctly. Starting with the basics of R and statistical reasoning, this book dives into advanced predictive analytics, showing how to apply those techniques to real-world data though with real-world examples. Packed with engaging problems and exercises, this book begins with a review of R and its syntax with packages like Rcpp, ggplot2, and dplyr. From there, get to grips with the fundamentals of applied statistics and build on this knowledge to perform sophisticated and powerful analytics. Solve the difficulties relating to performing data analysis in practice and find solutions to working with messy data, large data, communicating results, and facilitating reproducibility. This book is engineered to be an invaluable resource through many stages of anyone’s career as a data analyst.
Table of Contents (24 chapters)
Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface
Index

Summary


Messy data, no matter what definition you use, presents a huge roadblock for people who work with data. This chapter focused on two of the most notorious and prolific culprits: missing data and data that has not been cleaned or audited for quality.

On unsanitized data, we saw that the perhaps optimal solution (visually auditing the data) was untenable for moderately sized datasets or larger. We discovered that the grammar of the package assertr provides a mechanism to offload this auditing process to R. You now have a few assertr checking recipes under your belt for some of the more common manifestations of the mistakes that plague data that have not been scrutinized.