Book Image

Mastering Hadoop 3

By : Chanchal Singh, Manish Kumar
Book Image

Mastering Hadoop 3

By: Chanchal Singh, Manish Kumar

Overview of this book

Apache Hadoop is one of the most popular big data solutions for distributed storage and for processing large chunks of data. With Hadoop 3, Apache promises to provide a high-performance, more fault-tolerant, and highly efficient big data processing platform, with a focus on improved scalability and increased efficiency. With this guide, you’ll understand advanced concepts of the Hadoop ecosystem tool. You’ll learn how Hadoop works internally, study advanced concepts of different ecosystem tools, discover solutions to real-world use cases, and understand how to secure your cluster. It will then walk you through HDFS, YARN, MapReduce, and Hadoop 3 concepts. You’ll be able to address common challenges like using Kafka efficiently, designing low latency, reliable message delivery Kafka systems, and handling high data volumes. As you advance, you’ll discover how to address major challenges when building an enterprise-grade messaging system, and how to use different stream processing systems along with Kafka to fulfil your enterprise goals. By the end of this book, you’ll have a complete understanding of how components in the Hadoop ecosystem are effectively integrated to implement a fast and reliable data pipeline, and you’ll be equipped to tackle a range of real-world problems in data pipelines.
Table of Contents (23 chapters)
Title Page
Dedication
About Packt
Foreword
Contributors
Preface
Index

YARN command reference


Similar to HDFS, YARN also has its own commands to manage the overall YARN cluster. YARN provides two command-line interfaces, one is for users who want to run any service on a YARN cluster and the other is for administrators who will manage the overall YARN cluster.

User command

The user command in a Hadoop cluster is the one who submits applications to the Hadoop cluster. The application may fail or sometimes they do not perform well. In such scenarios, logs are the first step to debug your application and YARN stores logs for applications and containers that can be accessed via a command-line interface. 

 Application commands

The application command is used to perform operations with applications submitted to the YARN cluster. The operation can include listing all the applications with a specific state, killing the application, debugging application logs, and so on. Let's look into a few commands and how to use them:

  • -appStates: This command is used along with -list...