Book Image

Hadoop Real-World Solutions Cookbook

By : Jonathan R. Owens, Jon Lentz, Brian Femiano
Book Image

Hadoop Real-World Solutions Cookbook

By: Jonathan R. Owens, Jon Lentz, Brian Femiano

Overview of this book

<p>Helping developers become more comfortable and proficient with solving problems in the Hadoop space. People will become more familiar with a wide variety of Hadoop related tools and best practices for implementation.</p> <p>Hadoop Real-World Solutions Cookbook will teach readers how to build solutions using tools such as Apache Hive, Pig, MapReduce, Mahout, Giraph, HDFS, Accumulo, Redis, and Ganglia.</p> <p>Hadoop Real-World Solutions Cookbook provides in depth explanations and code examples. Each chapter contains a set of recipes that pose, then solve, technical challenges, and can be completed in any order. A recipe breaks a single problem down into discrete steps that are easy to follow. The book covers (un)loading to and from HDFS, graph analytics with Giraph, batch data analysis using Hive, Pig, and MapReduce, machine learning approaches with Mahout, debugging and troubleshooting MapReduce, and columnar storage and retrieval of structured data using Apache Accumulo.<br /><br />Hadoop Real-World Solutions Cookbook will give readers the examples they need to apply Hadoop technology to their own problems.</p>
Table of Contents (17 chapters)
Hadoop Real-World Solutions Cookbook
Credits
About the Authors
About the Reviewers
www.packtpub.com
Preface
Index

About the Authors

Jonathan R. Owens has a background in Java and C++, and has worked in both private and public sectors as a software engineer. Most recently, he has been working with Hadoop and related distributed processing technologies.

Currently, he works for comScore, Inc., a widely regarded digital measurement and analytics company. At comScore, he is a member of the core processing team, which uses Hadoop and other custom distributed systems to aggregate, analyze, and manage over 40 billion transactions per day.

Jon Lentz is a Software Engineer on the core processing team at comScore, Inc., an online audience measurement and analytics company. He prefers to do most of his coding in Pig. Before working at comScore, he developed software to optimize supply chains and allocate fixed-income securities.

Brian Femiano has a B.S. in Computer Science and has been programming professionally for over 6 years, the last two of which have been spent building advanced analytics and Big Data capabilities using Apache Hadoop. He has worked for the commercial sector in the past, but the majority of his experience comes from the government contracting space. He currently works for Potomac Fusion in the DC/Virginia area, where they develop scalable algorithms to study and enhance some of the most advanced and complex datasets in the government space. Within Potomac Fusion, he has taught courses and conducted training sessions to help teach Apache Hadoop and related cloud-scale technologies.