Book Image

HDInsight Essentials - Second Edition

By : Rajesh Nadipalli
Book Image

HDInsight Essentials - Second Edition

By: Rajesh Nadipalli

Overview of this book

Table of Contents (16 chapters)
HDInsight Essentials Second Edition
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Architectural considerations


A modern Data Lake based on Hadoop is now mainstream technology and is used in several public sectors and enterprises; however, the ecosystem is still evolving and new tools and projects are released every quarter. In the next few sections, I will highlight the key architectural considerations to ensure that your Data Lake is well-planned and extensible for years to come.

Extensible and modular

In Chapter 2, Enterprise Data Lake using HDInsight, we looked into the reference architecture for the next generation Data Lake, as shown in the following figure. Use this architecture to design and build reusable components and well-defined interfaces between each layer, which allows a pluggable model. Let's take an example for the processing layer if you start with Pig as your ELT and later decide to switch to a newer technology such as Spark; this change will be contained within the Processing layer of the stack and will not impact the other layers as long as the interface...