Learning Big Data with Amazon Elastic MapReduce

Learning Big Data with Amazon Elastic MapReduce

By : Amarkant Singh, Vijay Rayapati

Buy this Book

Learning Big Data with Amazon Elastic MapReduce

By: Amarkant Singh, Vijay Rayapati

Buy this Book

Overview of this book

<p>Amazon Elastic MapReduce is a web service used to process and store vast amount of data, and it is one of the largest Hadoop operators in the world. With the increase in the amount of data generated and collected by many businesses and the arrival of cost-effective cloud-based solutions for distributed computing, the feasibility to crunch large amounts of data to get deep insights within a short span of time has increased greatly.</p> <p>This book will get you started with AWS so that you can quickly create your own account and explore the services provided, many of which you might be delighted to use. This book covers the architectural details of the MapReduce framework, Apache Hadoop, various job models on EMR, how to manage clusters on EMR, and the command-line tools available with EMR. Each chapter builds on the knowledge of the previous one, leading to the final chapter where you will learn about solving a real-world use case using Apache Hadoop and EMR. This book will, therefore, get you up and running with major Big Data technologies quickly and efficiently.</p>

Learning Big Data with Amazon Elastic MapReduce

Credits

About the Authors

Acknowledgments

About the Reviewers

www.PacktPub.com

Preface

Free Chapter

Amazon Web Services

What is Amazon Web Services?

Structure and Design

Services provided by AWS

Creating an account on AWS

Launching the AWS management console

Getting started with Amazon EC2

Getting started with Amazon S3

Summary

MapReduce

The map function

The reduce function

What is MapReduce?

Data life cycle in the MapReduce framework

Real-world examples and use cases of MapReduce

Software distributions built on the MapReduce framework

Summary

Apache Hadoop

What is Apache Hadoop?

Hadoop modules

Hadoop Distributed File System

Apache Hadoop MapReduce

Apache Hadoop as a platform

Summary

Amazon EMR – Hadoop on Amazon Web Services

What is AWS EMR?

The EMR architecture

EMR use cases

Summary

Programming Hadoop on Amazon EMR

Hello World in Hadoop

Mapper implementation

Reducer implementation

Driver implementation

Summary

Executing Hadoop Jobs on an Amazon EMR Cluster

Creating an EC2 key pair

Creating a S3 bucket for input data and JAR

How to launch an EMR cluster

Viewing results

Summary

Amazon EMR – Cluster Management

EMR cluster management – different methods

EMR bootstrap actions

EMR cluster monitoring and troubleshooting

EMR best practices

Summary

Amazon EMR – Command-line Interface Client

EMR – CLI client installation

Launching and monitoring an EMR cluster using CLI

Summary

Hadoop Streaming and Advanced Hadoop Customizations

Hadoop streaming

Adding streaming Job Step on EMR

Advanced Hadoop customizations

Emitting results to multiple outputs

Summary

Use Case – Analyzing CloudFront Logs Using Amazon EMR

Use case definition

The solution architecture

Creating the Hadoop Job Step

Output ingestion to a data store

Using a visualization tool – Tableau Desktop

Summary

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

What is MapReduce?

MapReduce is a style of programming model getting popular with the emergence of easily accessible distributed cloud computing. It is a programming paradigm that allows massively parallel execution and brings in the scalability required for processing huge amounts of data within desired time frames.

As for the definition, here is a quote from an abstract of the initial paper on MapReduce from Google; it says:

"MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key.
Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines."

The abstract also states that the runtime system, which will be a part of the MapReduce framework, will take care of the...

Learning Big Data with Amazon Elastic MapReduce

By : Amarkant Singh, Vijay Rayapati

Learning Big Data with Amazon Elastic MapReduce

By: Amarkant Singh, Vijay Rayapati

Overview of this book

Related Content you might be interested in

Current Title:

Learning Big Data with Amazon Elastic MapReduce

What is MapReduce?