Book Image

Mastering Hadoop

By : Sandeep Karanth
Book Image

Mastering Hadoop

By: Sandeep Karanth

Overview of this book

Table of Contents (21 chapters)
Mastering Hadoop
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Index

Summary


Cloud is a cost-efficient and effective way of developing pre-operationalized analytics. The self-serve, pay-as-you-go, and elastic deployment features of the cloud are reasons for the cost benefits. Many companies such as Yelp and Netflix run massive analytic workloads using the cloud infrastructure. Apache Hadoop is available as a PaaS offering on all major cloud service providers.

Some key takeaways from this chapter are as follows:

  • Amazon's Hadoop offering is called Elastic MapReduce (EMR), and it has been around since 2009. Microsoft launched its Hadoop offering in 2012, which is known as HDInsight on Microsoft Azure.

  • Using an AWS account, a Hadoop cluster can be launched in a matter of minutes. The number of EC2 instances in the Hadoop cluster is currently limited to 20. For more instances, a special request needs to be mailed to Amazon. Here's a word of caution for you: remember to terminate your Hadoop EMR cluster after use. If this is not done, charges will be incurred even...