Book Image

Learning Big Data with Amazon Elastic MapReduce

By : Amarkant Singh, Vijay Rayapati
Book Image

Learning Big Data with Amazon Elastic MapReduce

By: Amarkant Singh, Vijay Rayapati

Overview of this book

<p>Amazon Elastic MapReduce is a web service used to process and store vast amount of data, and it is one of the largest Hadoop operators in the world. With the increase in the amount of data generated and collected by many businesses and the arrival of cost-effective cloud-based solutions for distributed computing, the feasibility to crunch large amounts of data to get deep insights within a short span of time has increased greatly.</p> <p>This book will get you started with AWS so that you can quickly create your own account and explore the services provided, many of which you might be delighted to use. This book covers the architectural details of the MapReduce framework, Apache Hadoop, various job models on EMR, how to manage clusters on EMR, and the command-line tools available with EMR. Each chapter builds on the knowledge of the previous one, leading to the final chapter where you will learn about solving a real-world use case using Apache Hadoop and EMR. This book will, therefore, get you up and running with major Big Data technologies quickly and efficiently.</p>
Table of Contents (18 chapters)
Learning Big Data with Amazon Elastic MapReduce
Credits
About the Authors
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Index

Creating a S3 bucket for input data and JAR


You will need to create an Amazon S3 bucket to hold the following four things:

  • Input file(s)

  • The custom JAR executable

  • Output file(s)

  • Hadoop job's logfiles generated by the EMR cluster

Perform the following steps to create a S3 bucket and upload the custom JAR we have created, and also upload the sample input file on which we have executed this locally in the previous chapter:

  1. Log in to your AWS management console and go to the EC2 Dashboard by navigating to Services | All AWS Services | S3. S3 doesn't require region selection.

  2. Click on Create Bucket and provide a suitable name for the bucket. Let's say you named your bucket learning-bigdata. It is to be kept in mind that S3 bucket names are unique globally, so your bucket name will be allowed only if no other bucket exists with the same name.

    At this point, your browser screen will look as follows:

  3. Create an appropriate folder structure inside the bucket. Click on Create Folder and create a folder named...