Let's take a look at a practical use case powered by Microsoft HDInsight that demonstrates the value of next generation Data Lake architecture.
The Virginia Bioinformatics Institute collaborates with institutes across the globe to locate undetected genes in a massive genome database that leads to exciting medical breakthroughs such as cancer therapies. This database size is growing exponentially across the 2,000 DNA sequencers and is generating 15 petabytes of genome data every year. Several universities lack storage and compute resources to handle this kind of workload in a timely and cost-effective manner.
The institute built a solution on top of Windows Azure HDInsight service to perform DNA sequencing analysis in the cloud. This enabled the team to analyze petabytes of data in a cost-effective and scalable manner. Let's take a look at how the Data Lake reference architecture applies to this use case. The following figure...