Book Image

Hadoop MapReduce v2 Cookbook - Second Edition: RAW

Book Image

Hadoop MapReduce v2 Cookbook - Second Edition: RAW

Overview of this book

Table of Contents (19 chapters)
Hadoop MapReduce v2 Cookbook Second Edition
Credits
About the Author
Acknowledgments
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Introduction


Hadoop ecosystem has a family of projects that are either built on top of Hadoop or work very closely with Hadoop. These projects have given rise to an ecosystem that focuses on large-scale data processing, and often users can use several of these projects in combination to solve their big data problems.

This chapter introduces several key projects in the Hadoop ecosystem and shows how to get started with each of these projects.

We will focus on the following four projects:

  • Pig: A dataflow-style data processing language for large-scale processing of data stored in HDFS

  • HBase: A NoSQL-style highly scalable data store, which provides low latency, random accessible and highly scalable data storage on top of HDFS

  • Mahout: A toolkit of machine-learning and data-mining tools

  • Sqoop: A data movement tool for efficient bulk data transfer between Apache Hadoop ecosystem and ralational databases

Note

Some of the HBase and Mahout recipes of this chapter are based on the Chapter 5, Hadoop Ecosystem...