Book Image

HDInsight Essentials - Second Edition

By : Rajesh Nadipalli
Book Image

HDInsight Essentials - Second Edition

By: Rajesh Nadipalli

Overview of this book

Table of Contents (16 chapters)
HDInsight Essentials Second Edition
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Use case powered by Microsoft HDInsight


Let's take a look at a practical use case powered by Microsoft HDInsight that demonstrates the value of next generation Data Lake architecture.

Problem statement

The Virginia Bioinformatics Institute collaborates with institutes across the globe to locate undetected genes in a massive genome database that leads to exciting medical breakthroughs such as cancer therapies. This database size is growing exponentially across the 2,000 DNA sequencers and is generating 15 petabytes of genome data every year. Several universities lack storage and compute resources to handle this kind of workload in a timely and cost-effective manner.

Solution

The institute built a solution on top of Windows Azure HDInsight service to perform DNA sequencing analysis in the cloud. This enabled the team to analyze petabytes of data in a cost-effective and scalable manner. Let's take a look at how the Data Lake reference architecture applies to this use case. The following figure...