Book Image

Data Lake Development with Big Data

By : Pradeep Pasupuleti, Beulah Salome Purra
Book Image

Data Lake Development with Big Data

By: Pradeep Pasupuleti, Beulah Salome Purra

Overview of this book

A Data Lake is a highly scalable platform for storing huge volumes of multistructured data from disparate sources with centralized data management services. This book explores the potential of Data Lakes and explores architectural approaches to building data lakes that ingest, index, manage, and analyze massive amounts of data using batch and real-time processing frameworks. It guides you on how to go about building a Data Lake that is managed by Hadoop and accessed as required by other Big Data applications. This book will guide readers (using best practices) in developing Data Lake's capabilities. It will focus on architect data governance, security, data quality, data lineage tracking, metadata management, and semantic data tagging. By the end of this book, you will have a good understanding of building a Data Lake for Big Data.
Table of Contents (13 chapters)


This book is dedicated to the loving memory of my mother, Smt. Sumathy; without her never-failing encouragement and everlasting love I would have never been half as good.

First and foremost, I have to thank my father, Sri. Prabhakar Pasupuleti, who never ceases to be a constant source of inspiration, a ray of hope, humility and strength, and whose support and guidance have given me the courage to chase my dreams.

I should also express my deep sense of gratitude to each of my family members, Sushma, Sresht, and Samvruth, who stood by me at every moment through very tough times and enabled me to complete this book.

I would like to sincerely thank all my teachers who were instrumental in shaping me. Among them, I would like to thank Usha Madam, Vittal Rao Sir, Gopal Krishna Sir, and Brindavan Sir for their stellar role in improving me.

I would also like to thank all my friends for their understanding in many ways. Their friendship makes my life a wonderful experience. I cannot list all the names here, but you are always on my mind.

Special thanks to the team at Packt for their contribution to this book.

Finally, I would like to thank my team, Salome, that has placed immense faith in the power of Big Data analytics and built cutting edge data products.

Thank you lord, for always being there for me.

Beulah Salome Purra has over 11 years of experience and she specializes in building highly scalable distributed systems. She has worked extensively on architecting multiple large-scale Big Data solutions for Fortune 100 companies. Her core expertise lies in working on Big Data Analytics. In her current role at ATMECS, her focus is on building robust and scalable data products that extract value from huge data assets.

She can be reached at