Book Image

Data Lake for Enterprises

By : Vivek Mishra, Tomcy John, Pankaj Misra
Book Image

Data Lake for Enterprises

By: Vivek Mishra, Tomcy John, Pankaj Misra

Overview of this book

The term "Data Lake" has recently emerged as a prominent term in the big data industry. Data scientists can make use of it in deriving meaningful insights that can be used by businesses to redefine or transform the way they operate. Lambda architecture is also emerging as one of the very eminent patterns in the big data landscape, as it not only helps to derive useful information from historical data but also correlates real-time data to enable business to take critical decisions. This book tries to bring these two important aspects — data lake and lambda architecture—together. This book is divided into three main sections. The first introduces you to the concept of data lakes, the importance of data lakes in enterprises, and getting you up-to-speed with the Lambda architecture. The second section delves into the principal components of building a data lake using the Lambda architecture. It introduces you to popular big data technologies such as Apache Hadoop, Spark, Sqoop, Flume, and ElasticSearch. The third section is a highly practical demonstration of putting it all together, and shows you how an enterprise data lake can be implemented, along with several real-world use-cases. It also shows you how other peripheral components can be added to the lake to make it more efficient. By the end of this book, you will be able to choose the right big data technologies using the lambda architectural patterns to build your enterprise data lake.
Table of Contents (23 chapters)
Title Page
About the Authors
About the Reviewers
Customer Feedback
Part 1 - Overview
Part 2 - Technical Building blocks of Data Lake
Part 3 - Bringing It All Together

About the Authors

Tomcy John lives in Dubai (United Arab Emirates), hailing from Kerala (India), and is an enterprise Java specialist with a degree in engineering (B Tech) and over 14 years of experience in several industries. He's currently working as principal architect at Emirates Group IT, in their core architecture team. Prior to this, he worked with Oracle Corporation and Ernst & Young. His main specialization is in building enterprise-grade applications and he acts as chief mentor and evangelist to facilitate incorporating new technologies as corporate standards in the organization. Outside of his work, Tomcy works very closely with young developers and engineers as mentors and speaks at various forums as a technical evangelist on many topics ranging from web and middleware all the way to various persistence stores. He writes on various topics in his blog and

First and foremost, I would like to thank my savior and lord, Jesus Christ, for giving me strength and courage to pursue this project. It was a dream come true.I would like to dedicate this book to my father (Appachan), Late C.O.John, and my dearest mom (Ammachi), Leela John, for helping me reach where I am today. I would also like to take this opportunity to thank my dearest wife, Serene and our two lovely children, Neil (son) and Anaya (daughter), for all their support throughout this project and also for allowing me to pursue my dream and tolerating not being with them after my busy day job.

It was my privilege working with my co-author, Pankaj. I take this opportunity to thank him for supporting me, when I first offloaded my dream of writing this book topic and then staying with me at all stages in completing this book. It wouldn't be possible to reach this stage in my career without mentors at various stages of my career. I would like to thank Thomas Benjamin (CTO, GE Aviation Digital), Rajesh R.V (chief architect, Emirates Group IT) and Martin Campbell (chief architect) for supporting me at various stages, with words of encouragement and wealth of knowledge.


Pankaj Misra has been a technology evangelist, holding a bachelor’s degree in engineering, with over 16 years of experience across multiple business domains and technologies. He has been working with Emirates Group IT since 2015, and has worked with various other organizations in the past. He specializes in architecting and building multi-stack solutions and implementations. He has also been a speaker at technology forums in India and has built products with scale-out architecture that support high-volume, near-real-time data processing and near-real-time analytics.

This book has been a great opportunity for me and would always be an exceptional example of collaboration and knowledge sharing with my co-author Tomcy. I am extremely thankful to him for entrusting me with this responsibility and standing by me at all times. I would like to dedicate this book to my father B. Misra  and my mother Geeta Misra who have always been one of the most special people to me. I am extremely grateful to my wife Priti and my kids, daughter Eva and son Siddhant, for their understanding, support and helping me out in every possible way to complete the book.

This book is a medium to give back the knowledge that I have gained by working with many of the amazing people throughout the years. I would like to thank Rajesh R.V. (chief Architect, Emirates Group IT) and Thomas Benjamin (CTO, GE Aviation) for always motivating, helping & supporting us.