Book Image

HDInsight Essentials - Second Edition

By : Rajesh Nadipalli
Book Image

HDInsight Essentials - Second Edition

By: Rajesh Nadipalli

Overview of this book

Table of Contents (16 chapters)
HDInsight Essentials Second Edition
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

HDInsight overview


HDInsight is an enterprise-ready distribution of Hadoop that runs on Windows servers and on Azure HDInsight cloud service (PaaS). It is a 100 percent Apache Hadoop-based service in the cloud. HDInsight was developed in partnership with Hortonworks and Microsoft. Enterprises can now harness the power of Hadoop on Windows servers and Windows Azure cloud service.

The following are the key differentiators for HDInsight distribution:

  • Enterprise-ready Hadoop: HDInsight is backed by Microsoft support, and runs on standard Windows servers. IT teams can leverage Hadoop with the Platform as a Service (PaaS) reducing the operations overhead.

  • Analytics using Excel: With Excel integration, your business users can visualize and analyze Hadoop data in compelling new ways with an easy to use familiar tool. The Excel add-ons PowerBI, PowerPivot, PowerQuery, and PowerMap integrate with HDInsight.

  • Develop in your favorite language: HDInsight has powerful programming extensions for languages, including .NET, C#, Java, and more.

  • Scale using cloud offering: Azure HDInsight service enables customers to scale quickly as per the project needs and have a seamless interface between HDFS and Azure Blob storage.

  • Connect on-premises Hadoop cluster with the cloud: With HDInsight, you can move Hadoop data from an on-site data center to the Azure cloud for backup, dev/test, and cloud bursting scenarios.

  • Includes NoSQL transactional capabilities: HDInsight also includes Apache HBase, a columnar NoSQL database that runs on top of Hadoop and allows large online transactional processing (OLTP).

  • HDInsight Emulator: The HDInsight Emulator provides a local development environment for Azure HDInsight without the need for a cloud subscription. This can be installed using the Microsoft Web Platform installer.

HDInsight and Hadoop relationship

HDInsight is an Apache Hadoop-based service. Let's review the stack in detail. The following figure shows you the stacks that make HDInsight:

The various components are as follows:

  • Apache Hadoop: This is an open source software that allows distributed storage and computation. Hadoop is reliable and scalable.

  • Hortonworks Data Platform (HDP): This is an open source Apache Hadoop data platform, architected for the enterprise on Linux and Windows servers. It has a comprehensive set of capabilities aligned to the following functional areas: data management, data access, data governance, security, and operations. The following are the key Apache Software Foundation (ASF) projects have been led and are included in HDP:

    • Apache Falcon: Falcon is a framework used for simplifying data management and pipeline processing in Hadoop. It also enables disaster recovery and data retention use cases.

    • Apache Tez: Tez is an extensible framework used for building YARN-based, high performance batch, and interactive data processing applications in Hadoop. Projects such as Hive and Pig can leverage Tez and get an improved performance.

    • Apache Knox: Knox is a system that provides a single point of authentication and access for Hadoop services in a cluster.

    • Apache Ambari: Ambari is an operational framework used for provisioning; managing, and monitoring Apache Hadoop clusters.

  • Azure HDInsight: This has been built in partnership with Hortonworks on top of HDP for Microsoft Servers and Azure cloud service. It has the following key additional value added services provided by Microsoft:

    • Integration with Azure Blob storage Excel, PowerBI, SQL Server, .Net, C#, Java, and others

    • Azure PowerShell, which is a powerful scripting environment that can be used to control, automate, and develop workloads in HDInsight