Book Image

Big Data Analytics with R and Hadoop

By : Vignesh Prajapati
Book Image

Big Data Analytics with R and Hadoop

By: Vignesh Prajapati

Overview of this book

<p>Big data analytics is the process of examining large amounts of data of a variety of types to uncover hidden patterns, unknown correlations, and other useful information. Such information can provide competitive advantages over rival organizations and result in business benefits, such as more effective marketing and increased revenue. New methods of working with big data, such as Hadoop and MapReduce, offer alternatives to traditional data warehousing. <br /><br />Big Data Analytics with R and Hadoop is focused on the techniques of integrating R and Hadoop by various tools such as RHIPE and RHadoop. A powerful data analytics engine can be built, which can process analytics algorithms over a large scale dataset in a scalable manner. This can be implemented through data analytics operations of R, MapReduce, and HDFS of Hadoop.<br /><br />You will start with the installation and configuration of R and Hadoop. Next, you will discover information on various practical data analytics examples with R and Hadoop. Finally, you will learn how to import/export from various data sources to R. Big Data Analytics with R and Hadoop will also give you an easy understanding of the R and Hadoop connectors RHIPE, RHadoop, and Hadoop streaming.</p>
Table of Contents (16 chapters)
Big Data Analytics with R and Hadoop
Credits
About the Author
Acknowledgment
About the Reviewers
www.PacktPub.com
Preface
Index

Understanding HBase


Apache HBase is a distributed Big Data store for Hadoop. This allows random, real-time, read/write access to Big Data. This is designed as a column-oriented, data-storage model, innovated after being inspired by Google Big table.

Understanding HBase features

Following are the features for HBase:

  • RESTful web service with XML

  • Linear and modular scalability

  • Strict consistent reads and writes

  • Extensible shell

  • Block cache and Bloom filters for real-time queries

Pre-requisites for RHBase are as follows:

  • Hadoop

  • HBase

  • Thrift

Here we assume that users have already configured Hadoop for their Linux machine. If anyone wishes to know how to install Hadoop on Linux, please refer to Chapter 1, Getting Ready to Use R and Hadoop.

Installing HBase

Following are the steps for installing HBase:

  1. Download the tar file of HBase and extract it:

    wget http://apache.cs.utah.edu/hbase/stable/hbase-0.94.11.tar.gz
    
    tar -xzf hbase-0.94.11.tar.gz
    
  2. Go to HBase installation directory and update the configuration files...