Book Image

Data Analysis with R, Second Edition - Second Edition

Book Image

Data Analysis with R, Second Edition - Second Edition

Overview of this book

Frequently the tool of choice for academics, R has spread deep into the private sector and can be found in the production pipelines at some of the most advanced and successful enterprises. The power and domain-specificity of R allows the user to express complex analytics easily, quickly, and succinctly. Starting with the basics of R and statistical reasoning, this book dives into advanced predictive analytics, showing how to apply those techniques to real-world data though with real-world examples. Packed with engaging problems and exercises, this book begins with a review of R and its syntax with packages like Rcpp, ggplot2, and dplyr. From there, get to grips with the fundamentals of applied statistics and build on this knowledge to perform sophisticated and powerful analytics. Solve the difficulties relating to performing data analysis in practice and find solutions to working with messy data, large data, communicating results, and facilitating reproducibility. This book is engineered to be an invaluable resource through many stages of anyone’s career as a data analyst.
Table of Contents (24 chapters)
Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface
Index

Using dplyr and tidyr to manipulate data


It’s perhaps a little unfair to say that data.table is opinionated. If it were, though, its main point of view is that the potential pitfalls of call by reference are sometimes a more than fair price to pay for unparalleled speed and memory-efficiency. It could probably also be said that data.table also prefers conciseness and prefers to (sometimes drastically) modify the semantics of a function by the use of arguments, instead of creating other functions.

There is, however, a new approach to non-base-R data manipulation on the scene. This package also puts efficiency at a premium, but not at the expense of code safety, readability, consistency, and interpretability. (This, by the way, is not to say, that data.table is unreadable, uninterpretable, or inconsistent; it’s just that this other package very explicitly states that these things are its top priorities. Judgements of readability are largely a function of habit and subjective tastes.) As you...