Big Data Analytics with R and Hadoop

Big Data Analytics with R and Hadoop

By : Vignesh Prajapati

Buy this Book

Big Data Analytics with R and Hadoop

By: Vignesh Prajapati

Buy this Book

Overview of this book

Big data analytics is the process of examining large amounts of data of a variety of types to uncover hidden patterns, unknown correlations, and other useful information. Such information can provide competitive advantages over rival organizations and result in business benefits, such as more effective marketing and increased revenue. New methods of working with big data, such as Hadoop and MapReduce, offer alternatives to traditional data warehousing. Big Data Analytics with R and Hadoop is focused on the techniques of integrating R and Hadoop by various tools such as RHIPE and RHadoop. A powerful data analytics engine can be built, which can process analytics algorithms over a large scale dataset in a scalable manner. This can be implemented through data analytics operations of R, MapReduce, and HDFS of Hadoop. You will start with the installation and configuration of R and Hadoop. Next, you will discover information on various practical data analytics examples with R and Hadoop. Finally, you will learn how to import/export from various data sources to R. Big Data Analytics with R and Hadoop will also give you an easy understanding of the R and Hadoop connectors RHIPE, RHadoop, and Hadoop streaming.

Big Data Analytics with R and Hadoop

Credits

About the Author

Acknowledgment

About the Reviewers

www.PacktPub.com

Preface

Free Chapter

Getting Ready to Use R and Hadoop

Installing R

Installing RStudio

Understanding the features of R language

Installing Hadoop

Understanding Hadoop features

Learning the HDFS and MapReduce architecture

Understanding Hadoop subprojects

Summary

Writing Hadoop MapReduce Programs

Understanding the basics of MapReduce

Introducing Hadoop MapReduce

Understanding the Hadoop MapReduce fundamentals

Writing a Hadoop MapReduce example

Learning the different ways to write Hadoop MapReduce in R

Summary

Integrating R and Hadoop

Introducing RHIPE

Introducing RHadoop

Summary

Using Hadoop Streaming with R

Understanding the basics of Hadoop streaming

Understanding how to run Hadoop streaming with R

Exploring the HadoopStreaming R package

Summary

Learning Data Analytics with R and Hadoop

Understanding the data analytics project life cycle

Understanding data analytics problems

Summary

Understanding Big Data Analysis with Machine Learning

Introduction to machine learning

Supervised machine-learning algorithms

Unsupervised machine learning algorithm

Recommendation algorithms

Summary

Importing and Exporting Data from Various DBs

Learning about data files as database

Understanding MySQL

Understanding Excel

Understanding MongoDB

Understanding SQLite

Understanding PostgreSQL

Understanding Hive

Understanding HBase

Summary

References

R + Hadoop help materials

R groups

Hadoop groups

R + Hadoop groups

Popular R contributors

Popular Hadoop contributors

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Introducing Hadoop MapReduce

Basically, the MapReduce model can be implemented in several languages, but apart from that, Hadoop MapReduce is a popular Java framework for easily written applications. It processes vast amounts of data (multiterabyte datasets) in parallel on large clusters (thousands of nodes) of commodity hardware in a reliable and fault-tolerant manner. This MapReduce paradigm is divided into two phases, Map and Reduce, that mainly deal with key-value pairs of data. The Map and Reduce tasks run sequentially in a cluster, and the output of the Map phase becomes the input of the Reduce phase.

All data input elements in MapReduce cannot be updated. If the input (key, value) pairs for mapping tasks are changed, it will not be reflected in the input files. The Mapper output will be piped to the appropriate Reducer grouped with the key attribute as input. This sequential data process will be carried away in a parallel manner with the help of Hadoop MapReduce algorithms as well...

Big Data Analytics with R and Hadoop

By : Vignesh Prajapati

Big Data Analytics with R and Hadoop

By: Vignesh Prajapati

Overview of this book

Related Content you might be interested in

Current Title:

Big Data Analytics with R and Hadoop

Introducing Hadoop MapReduce