HDInsight Essentials - Second Edition

Book Image

HDInsight Essentials - Second Edition

By : Rajesh Nadipalli

Book Image

HDInsight Essentials - Second Edition

By: Rajesh Nadipalli

Overview of this book

HDInsight Essentials Second Edition

HDInsight Essentials Second Edition

Credits

About the Author

About the Author

About the Reviewers

About the Reviewers

www.PacktPub.com

www.PacktPub.com

Preface

Free Chapter

Hadoop and HDInsight in a Heartbeat

Hadoop and HDInsight in a Heartbeat

Data is everywhere

Hadoop concepts

Hadoop distributions

HDInsight overview

Hadoop on Windows deployment options

Enterprise Data Lake using HDInsight

Enterprise Data Lake using HDInsight

Enterprise Data Warehouse architecture

The next generation Hadoop-based Enterprise data architecture

Journey to your Data Lake dream

Tools and technology for Hadoop ecosystem

Use case powered by Microsoft HDInsight

HDInsight Service on Azure

HDInsight Service on Azure

Registering for an Azure account

Provisioning an HDInsight cluster

HDInsight management dashboard

Exploring clusters using the remote desktop

Deleting the cluster

HDInsight Emulator for the development

Administering Your HDInsight Cluster

Administering Your HDInsight Cluster

Monitoring cluster health

Name Node status

Hadoop Service Availability

YARN Application Status

Azure storage management

Azure PowerShell

Ingest and Organize Data Lake

Ingest and Organize Data Lake

End-to-end Data Lake solution

Ingesting to Data Lake using HDFS command

Loading data to Azure Blob storage using Azure PowerShell

Loading files to Data Lake using GUI tools

Using Sqoop to move data from RDBMS to Data Lake

Organizing your Data Lake in HDFS

Managing file metadata using HCatalog

Transform Data in the Data Lake

Transform Data in the Data Lake

Transformation overview

Tools for transforming data in Data Lake

Transformation for the OTP project

Other tools used for transformation

Analyze and Report from Data Lake

Analyze and Report from Data Lake

Data access overview

Analysis using Excel and Microsoft Hive ODBC driver

Analysis using Excel Power Query

Other BI features in Excel

Ad hoc analysis using Hive

Other alternatives for analysis

HDInsight 3.1 New Features

HDInsight 3.1 New Features

Strategy for a Successful Data Lake Implementation

Strategy for a Successful Data Lake Implementation

Challenges on building a production Data Lake

The success path for a production Data Lake

Architectural considerations

Online resources

Index

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Transformation overview

Once you get data into the cluster, the next step in a typical project is to get data ready for future consumption. This typically involves data cleaning, data quality, and aggregation; for example, checking phone number format, valid date of birth, and aggregate sales by region.

In our mini project case, we are in step two of the data pipeline to the Data Lake.