HDInsight Essentials - Second Edition

Book Image

HDInsight Essentials - Second Edition

By : Rajesh Nadipalli

Book Image

HDInsight Essentials - Second Edition

By: Rajesh Nadipalli

Overview of this book

HDInsight Essentials Second Edition

HDInsight Essentials Second Edition

Credits

About the Author

About the Author

About the Reviewers

About the Reviewers

www.PacktPub.com

www.PacktPub.com

Preface

Free Chapter

Hadoop and HDInsight in a Heartbeat

Hadoop and HDInsight in a Heartbeat

Data is everywhere

Hadoop concepts

Hadoop distributions

HDInsight overview

Hadoop on Windows deployment options

Enterprise Data Lake using HDInsight

Enterprise Data Lake using HDInsight

Enterprise Data Warehouse architecture

The next generation Hadoop-based Enterprise data architecture

Journey to your Data Lake dream

Tools and technology for Hadoop ecosystem

Use case powered by Microsoft HDInsight

HDInsight Service on Azure

HDInsight Service on Azure

Registering for an Azure account

Provisioning an HDInsight cluster

HDInsight management dashboard

Exploring clusters using the remote desktop

Deleting the cluster

HDInsight Emulator for the development

Administering Your HDInsight Cluster

Administering Your HDInsight Cluster

Monitoring cluster health

Name Node status

Hadoop Service Availability

YARN Application Status

Azure storage management

Azure PowerShell

Ingest and Organize Data Lake

Ingest and Organize Data Lake

End-to-end Data Lake solution

Ingesting to Data Lake using HDFS command

Loading data to Azure Blob storage using Azure PowerShell

Loading files to Data Lake using GUI tools

Using Sqoop to move data from RDBMS to Data Lake

Organizing your Data Lake in HDFS

Managing file metadata using HCatalog

Transform Data in the Data Lake

Transform Data in the Data Lake

Transformation overview

Tools for transforming data in Data Lake

Transformation for the OTP project

Other tools used for transformation

Analyze and Report from Data Lake

Analyze and Report from Data Lake

Data access overview

Analysis using Excel and Microsoft Hive ODBC driver

Analysis using Excel Power Query

Other BI features in Excel

Ad hoc analysis using Hive

Other alternatives for analysis

HDInsight 3.1 New Features

HDInsight 3.1 New Features

Strategy for a Successful Data Lake Implementation

Strategy for a Successful Data Lake Implementation

Challenges on building a production Data Lake

The success path for a production Data Lake

Architectural considerations

Online resources

Index

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Summary

An Enterprise Data Lake journey starts first with getting valuable data into the lake. There are several mechanisms to ingest data into a Data Lake powered by HDInsight primarily: HDFS transfer, Azure PowerShell, Azure tools with a user interface, and Sqoop. In order to make a Data Lake easy to consume, it is important to have a managed ingestion process with governance and structure of the various directories.

HCatalog provides a shared metastore that can be used by various tools in Hadoop, namely, Hive, Pig, and MapReduce. This ensures that the structural information is defined once and leveraged by these tools. In the next chapter, we will look into the transformation of the data that we just ingested.