Book Image

Mastering Hadoop

By : Sandeep Karanth
Book Image

Mastering Hadoop

By: Sandeep Karanth

Overview of this book

Table of Contents (21 chapters)
Mastering Hadoop
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Index

RHadoop


R is a programming language used for statistics, data science, and visualization. It has a number of packages that can be imported to perform some specialized or custom tasks. It has more than 5,000 data analysis algorithms implemented as libraries. These algorithms can be used to facilitate a wide variety of data analysis tasks, much more than those supported by Apache Mahout. The community using R as a language is very big and vibrant.

However, R has two drawbacks: it executes in memory and its support for multithreading is minimal. These drawbacks make R unsuitable for big data crunching where disk-based analysis and distribution are mandatory. One alternative would be using R programs by using Hadoop Streaming. But this is a tedious proposition, and RHadoop had to be envisioned. RHadoop also uses Hadoop Streaming as its underlying mechanism to run R scripts in Hadoop, but alleviates some of the pain points that native streaming has. Some of the advantages of RHadoop are as follows...