Book Image

Big Data Analytics with R and Hadoop

By : Vignesh Prajapati
Book Image

Big Data Analytics with R and Hadoop

By: Vignesh Prajapati

Overview of this book

<p>Big data analytics is the process of examining large amounts of data of a variety of types to uncover hidden patterns, unknown correlations, and other useful information. Such information can provide competitive advantages over rival organizations and result in business benefits, such as more effective marketing and increased revenue. New methods of working with big data, such as Hadoop and MapReduce, offer alternatives to traditional data warehousing. <br /><br />Big Data Analytics with R and Hadoop is focused on the techniques of integrating R and Hadoop by various tools such as RHIPE and RHadoop. A powerful data analytics engine can be built, which can process analytics algorithms over a large scale dataset in a scalable manner. This can be implemented through data analytics operations of R, MapReduce, and HDFS of Hadoop.<br /><br />You will start with the installation and configuration of R and Hadoop. Next, you will discover information on various practical data analytics examples with R and Hadoop. Finally, you will learn how to import/export from various data sources to R. Big Data Analytics with R and Hadoop will also give you an easy understanding of the R and Hadoop connectors RHIPE, RHadoop, and Hadoop streaming.</p>
Table of Contents (16 chapters)
Big Data Analytics with R and Hadoop
Credits
About the Author
Acknowledgment
About the Reviewers
www.PacktPub.com
Preface
Index

Installing R


You can download the appropriate version by visiting the official R website.

Here are the steps provided for three different operating systems. We have considered Windows, Linux, and Mac OS for R installation. Download the latest version of R as it will have all the latest patches and resolutions to the past bugs.

For Windows, follow the given steps:

  1. Navigate to www.r-project.org.

  2. Click on the CRAN section, select CRAN mirror, and select your Windows OS (stick to Linux; Hadoop is almost always used in a Linux environment).

  3. Download the latest R version from the mirror.

  4. Execute the downloaded .exe to install R.

For Linux-Ubuntu, follow the given steps:

  1. Navigate to www.r-project.org.

  2. Click on the CRAN section, select CRAN mirror, and select your OS.

  3. In the /etc/apt/sources.list file, add the CRAN <mirror> entry.

  4. Download and update the package lists from the repositories using the sudo apt-get update command.

  5. Install R system using the sudo apt-get install r-base command.

For Linux-RHEL/CentOS, follow the given steps:

  1. Navigate to www.r-project.org.

  2. Click on CRAN, select CRAN mirror, and select Red Hat OS.

  3. Download the R-*core-*.rpm file.

  4. Install the .rpm package using the rpm -ivh R-*core-*.rpm command.

  5. Install R system using sudo yum install R.

For Mac, follow the given steps:

  1. Navigate to www.r-project.org.

  2. Click on CRAN, select CRAN mirror, and select your OS.

  3. Download the following files: pkg, gfortran-*.dmg, and tcltk-*.dmg.

  4. Install the R-*.pkg file.

  5. Then, install the gfortran-*.dmg and tcltk-*.dmg files.

After installing the base R package, it is advisable to install RStudio, which is a powerful and intuitive Integrated Development Environment (IDE) for R.

Tip

We can use R distribution of Revolution Analytics as a Modern Data analytics tool for statistical computing and predictive analytics, which is available in free as well as premium versions. Hadoop integration is also available to perform Big Data analytics.