Book Image

Data Lake for Enterprises

By : Vivek Mishra, Tomcy John, Pankaj Misra
Book Image

Data Lake for Enterprises

By: Vivek Mishra, Tomcy John, Pankaj Misra

Overview of this book

The term "Data Lake" has recently emerged as a prominent term in the big data industry. Data scientists can make use of it in deriving meaningful insights that can be used by businesses to redefine or transform the way they operate. Lambda architecture is also emerging as one of the very eminent patterns in the big data landscape, as it not only helps to derive useful information from historical data but also correlates real-time data to enable business to take critical decisions. This book tries to bring these two important aspects — data lake and lambda architecture—together. This book is divided into three main sections. The first introduces you to the concept of data lakes, the importance of data lakes in enterprises, and getting you up-to-speed with the Lambda architecture. The second section delves into the principal components of building a data lake using the Lambda architecture. It introduces you to popular big data technologies such as Apache Hadoop, Spark, Sqoop, Flume, and ElasticSearch. The third section is a highly practical demonstration of putting it all together, and shows you how an enterprise data lake can be implemented, along with several real-world use-cases. It also shows you how other peripheral components can be added to the lake to make it more efficient. By the end of this book, you will be able to choose the right big data technologies using the lambda architectural patterns to build your enterprise data lake.
Table of Contents (23 chapters)
Title Page
Credits
Foreword
About the Authors
About the Reviewers
www.PacktPub.com
Customer Feedback
Preface
Part 1 - Overview
Part 2 - Technical Building blocks of Data Lake
Part 3 - Bringing It All Together

Nodes in Elasticsearch


As detailed earlier, a node in Elasticsearch is one of the servers forming the cluster. A node in a cluster can be configured to work as different node types as follows:

  • Master node
  • Data node
  • Client node

Elasticsearch - master node

Any node in a cluster is eligible to become the master node if the node.master property is set to true in the elasticsearch.yml file. Once the master node is elected automatically by the cluster, this node is entrusted with some key responsibilities, as follows:

  • Allocate of shards across various nodes within the cluster.
  • Create and delete indexes.
  • Broadcast the cluster state to all the nodes in the cluster and in turn receives confirmations from each of those nodes back.
  • Take necessary actions when a node joins or leaves the cluster.
  • Ping all the nodes periodically and all nodes ping back the master periodically. If the master fails due to any reason, one of the other master-eligible nodes is elected as master by the cluster.

Elasticsearch - data node...