Big Data Analytics with R and Hadoop

Big Data Analytics with R and Hadoop

By : Vignesh Prajapati

Buy this Book

Big Data Analytics with R and Hadoop

By: Vignesh Prajapati

Buy this Book

Overview of this book

Big data analytics is the process of examining large amounts of data of a variety of types to uncover hidden patterns, unknown correlations, and other useful information. Such information can provide competitive advantages over rival organizations and result in business benefits, such as more effective marketing and increased revenue. New methods of working with big data, such as Hadoop and MapReduce, offer alternatives to traditional data warehousing. Big Data Analytics with R and Hadoop is focused on the techniques of integrating R and Hadoop by various tools such as RHIPE and RHadoop. A powerful data analytics engine can be built, which can process analytics algorithms over a large scale dataset in a scalable manner. This can be implemented through data analytics operations of R, MapReduce, and HDFS of Hadoop. You will start with the installation and configuration of R and Hadoop. Next, you will discover information on various practical data analytics examples with R and Hadoop. Finally, you will learn how to import/export from various data sources to R. Big Data Analytics with R and Hadoop will also give you an easy understanding of the R and Hadoop connectors RHIPE, RHadoop, and Hadoop streaming.

Big Data Analytics with R and Hadoop

Credits

About the Author

Acknowledgment

About the Reviewers

www.PacktPub.com

Preface

Free Chapter

Getting Ready to Use R and Hadoop

Installing R

Installing RStudio

Understanding the features of R language

Installing Hadoop

Understanding Hadoop features

Learning the HDFS and MapReduce architecture

Understanding Hadoop subprojects

Summary

Writing Hadoop MapReduce Programs

Understanding the basics of MapReduce

Introducing Hadoop MapReduce

Understanding the Hadoop MapReduce fundamentals

Writing a Hadoop MapReduce example

Learning the different ways to write Hadoop MapReduce in R

Summary

Integrating R and Hadoop

Introducing RHIPE

Introducing RHadoop

Summary

Using Hadoop Streaming with R

Understanding the basics of Hadoop streaming

Understanding how to run Hadoop streaming with R

Exploring the HadoopStreaming R package

Summary

Learning Data Analytics with R and Hadoop

Understanding the data analytics project life cycle

Understanding data analytics problems

Summary

Understanding Big Data Analysis with Machine Learning

Introduction to machine learning

Supervised machine-learning algorithms

Unsupervised machine learning algorithm

Recommendation algorithms

Summary

Importing and Exporting Data from Various DBs

Learning about data files as database

Understanding MySQL

Understanding Excel

Understanding MongoDB

Understanding SQLite

Understanding PostgreSQL

Understanding Hive

Understanding HBase

Summary

References

R + Hadoop help materials

R groups

Hadoop groups

R + Hadoop groups

Popular R contributors

Popular Hadoop contributors

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Exploring the HadoopStreaming R package

HadoopStreaming is an R package developed by David S. Rosenberg. We can say this is a simple framework for MapReduce scripting. This also runs without Hadoop for operating data in a streaming fashion. We can consider this R package as a Hadoop MapReduce initiator. For any analyst or developer who is not able to recall the Hadoop streaming command to be passed in the command prompt, this package will be helpful to quickly run the Hadoop MapReduce job.

The three main features of this package are as follows:

Chunkwise data reading: The package allows chunkwise data reading and writing for Hadoop streaming. This feature will overcome memory issues.
Supports various data formats: The package allows the reading and writing of data in three different data formats.
Robust utility for the Hadoop streaming command: The package also allows users to specify the command-line argument for Hadoop streaming.

This package is mainly designed with three functions for reading...

Big Data Analytics with R and Hadoop

By : Vignesh Prajapati

Big Data Analytics with R and Hadoop

By: Vignesh Prajapati

Overview of this book

Related Content you might be interested in

Current Title:

Big Data Analytics with R and Hadoop

Exploring the HadoopStreaming R package