Big Data Analytics with R and Hadoop

Big Data Analytics with R and Hadoop

By : Vignesh Prajapati

Buy this Book

Big Data Analytics with R and Hadoop

By: Vignesh Prajapati

Buy this Book

Overview of this book

Big data analytics is the process of examining large amounts of data of a variety of types to uncover hidden patterns, unknown correlations, and other useful information. Such information can provide competitive advantages over rival organizations and result in business benefits, such as more effective marketing and increased revenue. New methods of working with big data, such as Hadoop and MapReduce, offer alternatives to traditional data warehousing. Big Data Analytics with R and Hadoop is focused on the techniques of integrating R and Hadoop by various tools such as RHIPE and RHadoop. A powerful data analytics engine can be built, which can process analytics algorithms over a large scale dataset in a scalable manner. This can be implemented through data analytics operations of R, MapReduce, and HDFS of Hadoop. You will start with the installation and configuration of R and Hadoop. Next, you will discover information on various practical data analytics examples with R and Hadoop. Finally, you will learn how to import/export from various data sources to R. Big Data Analytics with R and Hadoop will also give you an easy understanding of the R and Hadoop connectors RHIPE, RHadoop, and Hadoop streaming.

Big Data Analytics with R and Hadoop

Credits

About the Author

Acknowledgment

About the Reviewers

www.PacktPub.com

Preface

Free Chapter

Getting Ready to Use R and Hadoop

Installing R

Installing RStudio

Understanding the features of R language

Installing Hadoop

Understanding Hadoop features

Learning the HDFS and MapReduce architecture

Understanding Hadoop subprojects

Summary

Writing Hadoop MapReduce Programs

Understanding the basics of MapReduce

Introducing Hadoop MapReduce

Understanding the Hadoop MapReduce fundamentals

Writing a Hadoop MapReduce example

Learning the different ways to write Hadoop MapReduce in R

Summary

Integrating R and Hadoop

Introducing RHIPE

Introducing RHadoop

Summary

Using Hadoop Streaming with R

Understanding the basics of Hadoop streaming

Understanding how to run Hadoop streaming with R

Exploring the HadoopStreaming R package

Summary

Learning Data Analytics with R and Hadoop

Understanding the data analytics project life cycle

Understanding data analytics problems

Summary

Understanding Big Data Analysis with Machine Learning

Introduction to machine learning

Supervised machine-learning algorithms

Unsupervised machine learning algorithm

Recommendation algorithms

Summary

Importing and Exporting Data from Various DBs

Learning about data files as database

Understanding MySQL

Understanding Excel

Understanding MongoDB

Understanding SQLite

Understanding PostgreSQL

Understanding Hive

Understanding HBase

Summary

References

R + Hadoop help materials

R groups

Hadoop groups

R + Hadoop groups

Popular R contributors

Popular Hadoop contributors

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Writing a Hadoop MapReduce example

Now we will move forward with MapReduce by learning a very common and easy example of word count. The goal of this example is to calculate how many times each word occurs in the provided documents. These documents can be considered as input to MapReduce's file.

In this example, we already have a set of text files—we want to identify the frequency of all the unique words existing in the files. We will get this by designing the Hadoop MapReduce phase.

In this section, we will see more on Hadoop MapReduce programming using Hadoop MapReduce's old API. Here we assume that the reader has already set up the Hadoop environment as described in Chapter 1, Getting Ready to Use R and Hadoop. Also, keep in mind that we are not going to use R to count words; only Hadoop will be used here.

Basically, Hadoop MapReduce has three main objects: Mapper, Reducer, and Driver. They can be developed with three Java classes; they are the Map class, Reduce class, and Driver class,...

Big Data Analytics with R and Hadoop

By : Vignesh Prajapati

Big Data Analytics with R and Hadoop

By: Vignesh Prajapati

Overview of this book

Related Content you might be interested in

Current Title:

Big Data Analytics with R and Hadoop

Writing a Hadoop MapReduce example