Book Image

Hands-on DevOps

By : Sricharan Vadapalli
Book Image

Hands-on DevOps

By: Sricharan Vadapalli

Overview of this book

<p>DevOps strategies have really become an important factor for big data environments.</p> <p>This book initially provides an introduction to big data, DevOps, and Cloud computing along with the need for DevOps strategies in big data environments. We move on to explore the adoption of DevOps frameworks and business scenarios. We then build a big data cluster, deploy it on the cloud, and explore DevOps activities such as CI/CD and containerization. Next, we cover big data concepts such as ETL for data sources, Hadoop clusters, and their applications. Towards the end of the book, we explore ERP applications useful for migrating to DevOps frameworks and examine a few case studies for migrating big data and prediction models.</p> <p>By the end of this book, you will have mastered implementing DevOps tools and strategies for your big data clusters.</p>
Table of Contents (22 chapters)
Title Page
Credits
About the Author
About the Reviewers
www.PacktPub.com
Customer Feedback
Preface
11
DevOps Adoption by ERP Systems
12
DevOps Periodic Table
13
Business Intelligence Trends
14
Testing Types and Levels
15
Java Platform SE 8

Hadoop big data cluster nodes


We will discuss the different types of nodes along with their role and usage in Hadoop Ecosystem:

  • NameNode: The NameNode is an important part of an HDFS file system. It keeps the directory tree of all files in the file system, and tracks across where  the cluster data files are stored. The data for these files is not stored at all. Client applications communicate with NameNode whenever there is a need to locate a file, or when they want to modify a file. The modifications are stored by NameNode as a log that is appended to a native file system file edits. When a NameNode starts up, it reads the HDFS state from an image file, fsimage, and then applies the edits to the log file.
  • Secondary NameNode: Secondary NameNode's whole purpose is to have a checkpoint in HDFS. The Secondary NameNode is just a helper node for NameNode; it merges the fsimage and the edits log files periodically and keeps edits log size within a limit.
  • DataNode: A DataNode stores data in HDFS....