Book Image

Hadoop Beginner's Guide

Book Image

Hadoop Beginner's Guide

Overview of this book

Data is arriving faster than you can process it and the overall volumes keep growing at a rate that keeps you awake at night. Hadoop can help you tame the data beast. Effective use of Hadoop however requires a mixture of programming, design, and system administration skills."Hadoop Beginner's Guide" removes the mystery from Hadoop, presenting Hadoop and related technologies with a focus on building working systems and getting the job done, using cloud services to do so when it makes sense. From basic concepts and initial setup through developing applications and keeping the system running as the data grows, the book gives the understanding needed to effectively use Hadoop to solve real world problems.Starting with the basics of installing and configuring Hadoop, the book explains how to develop applications, maintain the system, and how to use additional products to integrate with other systems.While learning different ways to develop applications to run on Hadoop the book also covers tools such as Hive, Sqoop, and Flume that show how Hadoop can be integrated with relational databases and log collection.In addition to examples on Hadoop clusters on Ubuntu uses of cloud services such as Amazon, EC2 and Elastic MapReduce are covered.
Table of Contents (19 chapters)
Hadoop Beginner's Guide
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Time for action – running UFO analysis on EMR


Let us explore the use of EMR with Hive by doing some UFO analysis on the platform.

  1. Log in to the AWS management console at http://aws.amazon.com/console.

  2. Every Hive job flow on EMR runs from an S3 bucket and we need to select the bucket we wish to use for this purpose. Select S3 to see the list of the buckets associated with your account and then choose the bucket from which to run the example, in the example below, we select the bucket called garryt1use.

  3. Use the web interface to create three directories called ufodata, ufoout, and ufologs within that bucket. The resulting list of the bucket's contents should look like the following screenshot:

  4. Double-click on the ufodata directory to open it and within it create two subdirectories called ufo and states.

  5. Create the following as s3test.hql, click on the Upload link within the ufodata directory, and follow the prompts to upload the file:

    CREATE EXTERNAL TABLE IF NOT EXISTS ufodata(sighted string, reported...