Book Image

Data Analysis with R, Second Edition - Second Edition

Book Image

Data Analysis with R, Second Edition - Second Edition

Overview of this book

Frequently the tool of choice for academics, R has spread deep into the private sector and can be found in the production pipelines at some of the most advanced and successful enterprises. The power and domain-specificity of R allows the user to express complex analytics easily, quickly, and succinctly. Starting with the basics of R and statistical reasoning, this book dives into advanced predictive analytics, showing how to apply those techniques to real-world data though with real-world examples. Packed with engaging problems and exercises, this book begins with a review of R and its syntax with packages like Rcpp, ggplot2, and dplyr. From there, get to grips with the fundamentals of applied statistics and build on this knowledge to perform sophisticated and powerful analytics. Solve the difficulties relating to performing data analysis in practice and find solutions to working with messy data, large data, communicating results, and facilitating reproducibility. This book is engineered to be an invaluable resource through many stages of anyone’s career as a data analyst.
Table of Contents (24 chapters)
Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface
Index

Using Rcpp


Contrary to what I sometimes like to believe, there are other computer programming languages than just R. R, and languages like Python, Perl, and Ruby, are considered high-level languages, because they offer a greater level of abstraction from computer representations and resource management than the lower-level languages. For example, in some lower-level languages, you must specify the data type of the variables you create and manage the allocation of RAM manually - C, C++, and Fortran are of this type.

The high level of abstraction R provides allows us to do amazing things very quickly, such as import a data set, run a linear model, and plot the data and regression line in no more than four lines of code! On the other hand, nothing quite beats the performance of carefully crafted lower-level code. Even so, it would take hundreds of lines of code to run a linear model in a low-level language, so a language like that is inappropriate for agile analytics.

One solution is to use R...