Book Image

Big Data Analytics with R

By : Simon Walkowiak
Book Image

Big Data Analytics with R

By: Simon Walkowiak

Overview of this book

Big Data analytics is the process of examining large and complex data sets that often exceed the computational capabilities. R is a leading programming language of data science, consisting of powerful functions to tackle all problems related to Big Data processing. The book will begin with a brief introduction to the Big Data world and its current industry standards. With introduction to the R language and presenting its development, structure, applications in real world, and its shortcomings. Book will progress towards revision of major R functions for data management and transformations. Readers will be introduce to Cloud based Big Data solutions (e.g. Amazon EC2 instances and Amazon RDS, Microsoft Azure and its HDInsight clusters) and also provide guidance on R connectivity with relational and non-relational databases such as MongoDB and HBase etc. It will further expand to include Big Data tools such as Apache Hadoop ecosystem, HDFS and MapReduce frameworks. Also other R compatible tools such as Apache Spark, its machine learning library Spark MLlib, as well as H2O.
Table of Contents (16 chapters)
Big Data Analytics with R
About the Author
About the Reviewers

The future of R

In the following brief sections, we are going to try to imagine how R may develop within the next several years to facilitate Big, Fast, and Smart data processing.

Big Data

We hope that by reading this book you have gained an appreciation for the R language and what can potentially be achieved by integrating it with currently available Big Data tools. As the last few years have brought us many new Big Data technologies, it has to be said that the full connectivity of R with these new frameworks may take some time. The availability of approaches utilizing R to process large datasets on a single machine is still quite limited due to traditional limitations of the R language itself. The ultimate solution to this problem may only be achieved by defining the language from scratch, but this is obviously an extreme and largely impractical idea. There is a lot of hope associated with Microsoft R Open, but as these are still quite early days for this new distribution, we need to wait...