Book Image

HDInsight Essentials - Second Edition

By : Rajesh Nadipalli
Book Image

HDInsight Essentials - Second Edition

By: Rajesh Nadipalli

Overview of this book

Table of Contents (16 chapters)
HDInsight Essentials Second Edition
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

The success path for a production Data Lake


For a successful production Data Lake transition, there are three key steps:

  • Identify the big data problem

  • Conduct a successful proof of technology

  • Form a Data Lake Center of Excellence

Let's review each of these steps in detail.

Identifying the big data problem

A big data solution should not be considered as a hammer looking for a nail but a solution for a real business problem. The first step for a Data Lake journey is to evaluate your current state architecture and business needs to see whether there is a real big data problem. It is possible that some of your current systems are better suited for handling your business requirements than a new Data Lake.

To give you some ideas, the following are the top use cases of a Hadoop-based Data Lake and might be relevant to your organization:

  • ETL offload: Hadoop MapReduce provides a low cost alternative for the traditional batch-oriented extract-transform-load. Offloading this to Hadoop will free up your data...