Book Image

HDInsight Essentials - Second Edition

By : Rajesh Nadipalli
Book Image

HDInsight Essentials - Second Edition

By: Rajesh Nadipalli

Overview of this book

Table of Contents (16 chapters)
HDInsight Essentials Second Edition
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Tools and technology for Hadoop ecosystem


Next generation architecture includes Hadoop-based projects that complement the traditional RDBMS systems. The following table highlights key projects that are organized by Data Lake capabilities:

Capability

Tool

Description

Ingest

Flume

This is a distributed and reliable software to collect large amounts of data from different sources such as logfiles in a streaming fashion in Hadoop.

Ingest

Sqoop

This tool is designed to transfer data between Hadoop and RDBMS such as Oracle, Teradata, and SQL Server.

Organize

HCatalog

This tool stores metadata for Hadoop, including file structures and formats. It provides an abstraction and interoperability across various tools such as Pig, MapReduce, Hive, Impala, and others.

Tranform

Oozie

This is a workflow scheduler system to manage Apache Hadoop jobs, which can be MapReduce, Hive, Pig, and others. It provides developers greater control over complex jobs and also makes it easier to repeat those...