Book Image

Hadoop Real-World Solutions Cookbook

By : Jonathan R. Owens, Jon Lentz, Brian Femiano
Book Image

Hadoop Real-World Solutions Cookbook

By: Jonathan R. Owens, Jon Lentz, Brian Femiano

Overview of this book

<p>Helping developers become more comfortable and proficient with solving problems in the Hadoop space. People will become more familiar with a wide variety of Hadoop related tools and best practices for implementation.</p> <p>Hadoop Real-World Solutions Cookbook will teach readers how to build solutions using tools such as Apache Hive, Pig, MapReduce, Mahout, Giraph, HDFS, Accumulo, Redis, and Ganglia.</p> <p>Hadoop Real-World Solutions Cookbook provides in depth explanations and code examples. Each chapter contains a set of recipes that pose, then solve, technical challenges, and can be completed in any order. A recipe breaks a single problem down into discrete steps that are easy to follow. The book covers (un)loading to and from HDFS, graph analytics with Giraph, batch data analysis using Hive, Pig, and MapReduce, machine learning approaches with Mahout, debugging and troubleshooting MapReduce, and columnar storage and retrieval of structured data using Apache Accumulo.<br /><br />Hadoop Real-World Solutions Cookbook will give readers the examples they need to apply Hadoop technology to their own problems.</p>
Table of Contents (17 chapters)
Hadoop Real-World Solutions Cookbook
Credits
About the Authors
About the Reviewers
www.packtpub.com
Preface
Index

Introduction


When working with Apache Hive, Pig, and MapReduce, you may find yourself having to perform certain tasks frequently. The recipes in this chapter provide solutions for executing several very common routines.

You will find that these tools let you solve the same problems in numerous different ways. Deciding on the right implementation can be a difficult task. The recipes presented here were designed for coding efficiency and clarity.

Hive and Pig provide a clean abstraction layer between your data flow and meaningful queries, and the complex MapReduce workflows they compile to. You can leverage the power of MapReduce for scalable queries without having to think about the underlying MapReduce semantics. Both tools handle the decomposition and building of your expressions into the proper MapReduce sequences. Hive lets you build analytics and manage data using a declarative, SQL-like dialect known as HiveQL. Pig operations are written in Pig Latin and take a more imperative form.