Book Image

Hands-on DevOps

By : Sricharan Vadapalli
Book Image

Hands-on DevOps

By: Sricharan Vadapalli

Overview of this book

<p>DevOps strategies have really become an important factor for big data environments.</p> <p>This book initially provides an introduction to big data, DevOps, and Cloud computing along with the need for DevOps strategies in big data environments. We move on to explore the adoption of DevOps frameworks and business scenarios. We then build a big data cluster, deploy it on the cloud, and explore DevOps activities such as CI/CD and containerization. Next, we cover big data concepts such as ETL for data sources, Hadoop clusters, and their applications. Towards the end of the book, we explore ERP applications useful for migrating to DevOps frameworks and examine a few case studies for migrating big data and prediction models.</p> <p>By the end of this book, you will have mastered implementing DevOps tools and strategies for your big data clusters.</p>
Table of Contents (22 chapters)
Title Page
Credits
About the Author
About the Reviewers
www.PacktPub.com
Customer Feedback
Preface
11
DevOps Adoption by ERP Systems
12
DevOps Periodic Table
13
Business Intelligence Trends
14
Testing Types and Levels
15
Java Platform SE 8

Big data clusters


A Hadoop cluster is a system comprising two or more computers or systems (called nodes). It represents a single unified system for the users. The nodes work together to execute applications or perform other tasks like a virtual machine. There are variants of Hadoop clusters that cater for different data needs. The key features in the construction of these platforms are reliability, load balancing, and performance.

The single node or pseudo-distributed cluster has the essential daemons such as NameNode, DataNode, JobTracker, and TaskTracker, all run on the same machine. A single node cluster is a simple configuration system used to test Hadoop applications by simulating a full cluster-like environment with a replication factor of 1.

A small Hadoop cluster comprises a single master and multiple worker nodes. The master node is comprised of a Job Tracker, Task Tracker, NameNode, and DataNode. A slave or worker node performs the roles of both a DataNode and TaskTracker if required...