Book Image

Hadoop Essentials

By : Shiva Achari
Book Image

Hadoop Essentials

By: Shiva Achari

Overview of this book

This book jumps into the world of Hadoop and its tools, to help you learn how to use them effectively to optimize and improve the way you handle Big Data. Starting with the fundamentals Hadoop YARN, MapReduce, HDFS, and other vital elements in the Hadoop ecosystem, you will soon learn many exciting topics such as MapReduce patterns, data management, and real-time data analysis using Hadoop. You will also explore a number of the leading data processing tools including Hive and Pig, and learn how to use Sqoop and Flume, two of the most powerful technologies used for data ingestion. With further guidance on data streaming and real-time analytics with Storm and Spark, Hadoop Essentials is a reliable and relevant resource for anyone who understands the difficulties - and opportunities - presented by Big Data today. With this guide, you'll develop your confidence with Hadoop, and be able to use the knowledge and skills you learn to successfully harness its unparalleled capabilities.
Table of Contents (15 chapters)
Hadoop Essentials
About the Author
About the Reviewers
Pillars of Hadoop – HDFS, MapReduce, and YARN

About the Reviewers

Anindita Basak is working as a big data cloud consultant and trainer and is highly enthusiastic about core Apache Hadoop, vendor-specific Hadoop distributions, and the Hadoop open source ecosystem. She works as a specialist in a big data start-up in the Bay area and with fortune brand clients across the U.S. She has been playing with Hadoop on Azure from the days of its incubation (that is, Previously in her role, she has worked as a module lead for Alten Group Company and in the Azure Pro Direct Delivery group for Microsoft. She has also worked as a senior software engineer on the implementation and migration of various enterprise applications on Azure Cloud in the healthcare, retail, and financial domain. She started her journey with Microsoft Azure in the Microsoft Cloud Integration Engineering (CIE) team and worked as a support engineer for Microsoft India (R&D) Pvt. Ltd.

With more than 7 years of experience with the Microsoft .NET, Java, and the Hadoop technology stack, she is solely focused on the big data cloud and data science. She is a technical speaker, active blogger, and conducts various training programs on the Hortonworks and Cloudera developer/administrative certification programs. As an MVB, she loves to share her technical experience and expertise through her blog at and You can get a deeper insight into her professional life on her LinkedIn page, and you can follow her on Twitter. Her Twitter handle is @imcuteani.

She recently worked as a technical reviewer for HDInsight Essentials (volume I and II) and Microsoft Tabular Modeling Cookbook, both by Packt Publishing.

Ralf Becher has worked as an IT system architect and data management consultant for more than 15 years in the areas of banking, insurance, logistics, automotive, and retail.

He is specialized in modern, quality-assured data management. He has been helping customers process, evaluate, and maintain the quality of the company data by helping them introduce, implement, and improve complex solutions in the fields of data architecture, data integration, data migration, master data management, metadata management, data warehousing, and business intelligence.

He started working with big data on Hadoop in 2012. He runs his BI and data integration blog at

Marius Danciu has over 15 years of experience in developing and architecting Java platform server-side applications in the data synchronization and big data analytics fields. He's very fond of the Scala programming language and functional programming concepts and finding its applicability in everyday work. He is the coauthor of The Definitive Guide to Lift, Apress.

Dmitry Spikhalskiy is currently holding the position of a software engineer at the Russian social network, Odnoklassniki, and working on a search engine, video recommendation system, and movie content analysis.

Previously, he took part in developing the Mind Labs' platform and its infrastructure, and benchmarks for high load video conference and streaming services, which got "The biggest online-training in the world" Guinness World Record. More than 12,000 people participated in this competition. He also a mobile social banking start-up called Instabank as its technical lead and architect. He has also reviewed Learning Google Guice, PostgreSQL 9 Admin Cookbook, and Hadoop MapReduce v2 Cookbook, all by Packt Publishing.

He graduated from Moscow State University with an MSc degree in computer science, where he first got interested in parallel data processing, high load systems, and databases.