Book Image

Mastering Hadoop

By : Karanth
Book Image

Mastering Hadoop

By: Karanth

Overview of this book

Do you want to broaden your Hadoop skill set and take your knowledge to the next level? Do you wish to enhance your knowledge of Hadoop to solve challenging data processing problems? Are your Hadoop jobs, Pig scripts, or Hive queries not working as fast as you intend? Are you looking to understand the benefits of upgrading Hadoop? If the answer is yes to any of these, this book is for you. It assumes novice-level familiarity with Hadoop.
Table of Contents (15 chapters)
14
Index

Summary


Cloud is a cost-efficient and effective way of developing pre-operationalized analytics. The self-serve, pay-as-you-go, and elastic deployment features of the cloud are reasons for the cost benefits. Many companies such as Yelp and Netflix run massive analytic workloads using the cloud infrastructure. Apache Hadoop is available as a PaaS offering on all major cloud service providers.

Some key takeaways from this chapter are as follows:

  • Amazon's Hadoop offering is called Elastic MapReduce (EMR), and it has been around since 2009. Microsoft launched its Hadoop offering in 2012, which is known as HDInsight on Microsoft Azure.

  • Using an AWS account, a Hadoop cluster can be launched in a matter of minutes. The number of EC2 instances in the Hadoop cluster is currently limited to 20. For more instances, a special request needs to be mailed to Amazon. Here's a word of caution for you: remember to terminate your Hadoop EMR cluster after use. If this is not done, charges will be incurred even...