Book Image

Learning Big Data with Amazon Elastic MapReduce

By : Amarkant Singh, Vijay Rayapati
Book Image

Learning Big Data with Amazon Elastic MapReduce

By: Amarkant Singh, Vijay Rayapati

Overview of this book

<p>Amazon Elastic MapReduce is a web service used to process and store vast amount of data, and it is one of the largest Hadoop operators in the world. With the increase in the amount of data generated and collected by many businesses and the arrival of cost-effective cloud-based solutions for distributed computing, the feasibility to crunch large amounts of data to get deep insights within a short span of time has increased greatly.</p> <p>This book will get you started with AWS so that you can quickly create your own account and explore the services provided, many of which you might be delighted to use. This book covers the architectural details of the MapReduce framework, Apache Hadoop, various job models on EMR, how to manage clusters on EMR, and the command-line tools available with EMR. Each chapter builds on the knowledge of the previous one, leading to the final chapter where you will learn about solving a real-world use case using Apache Hadoop and EMR. This book will, therefore, get you up and running with major Big Data technologies quickly and efficiently.</p>
Table of Contents (18 chapters)
Learning Big Data with Amazon Elastic MapReduce
Credits
About the Authors
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Index

What is Amazon Web Services?


As the name suggests, Amazon Web Services (AWS) is a set of cloud computing services provided by Amazon that are accessible over the Internet. Since anybody can sign up and use it, AWS is classified as a public cloud computing provider.

Most of the businesses depend on applications running on a set of compute and storage resources that needs to be reliable and secure and shall scale as and when required. The latter attribute required in there, scaling, is one of the major problems with the traditional data center approach. If the business provisions too many resources expecting heavy usage of their applications, they might need to invest a lot of upfront capital (CAPEX) on their IT. Now, what if they do not receive the expected traffic? Also, if the business provisions fewer resources expecting lesser traffic and ends up with receiving more than expected traffic, they would surely have disgruntled customers and bad experience.

AWS provides scalable compute services, highly durable storage services, and low-latency database services among others to enable businesses to quickly provision the required infrastructure for the business to launch and run applications. Almost everything that you can do on a traditional data center can be achieved with AWS. AWS brings in the ability to add and remove compute resources elastically. You can start with the number of resources you expect is required, and as you go, you can scale it up to meet increasing traffic or to meet specific customer requirements. Alternatively, you may scale it down any time as required, saving money and having the flexibility to make required changes quickly. Hence, you need not invest a huge capital upfront or worry about capacity planning. Also, with AWS, you only need to pay-per-use. So, for example, if you have a business that needs more resources during a specific time of day, say for a couple of hours, with AWS, you may configure it to add resources for you and then scale down automatically as specified. In this case, you only pay for the added extra resources for those couple of hours of usage. Many businesses have leveraged AWS in this fashion to support their requirements and reduce costs.

How does AWS provide infrastructure at such low cost and at pay-per-use? The answer lies in AWS having huge number of customers spread across almost all over the world—allowing AWS to have the economies of scale, which lets AWS bring quality resources at a low operational cost to us.

Experiments and ideas that were once constrained on cost or resources are very much feasible now with AWS, resulting in increased capacity for businesses to innovate and deliver higher quality products to their customers.

Hence, AWS enables businesses around the world to focus on delivering quality experience to their customers, while AWS takes care of the heavy lifting required to launch and keep running those applications at an expected scale, securely and reliably.