YARN Essentials

Book Image

YARN Essentials

Book Image

YARN Essentials

Overview of this book

YARN Essentials

YARN Essentials

Credits

About the Authors

About the Authors

About the Reviewers

About the Reviewers

www.PacktPub.com

www.PacktPub.com

Preface

Free Chapter

Need for YARN

The redesign idea

YARN Architecture

YARN Architecture

Core components of YARN architecture

YARN scheduler policies

Recent developments in YARN architecture

YARN Installation

YARN Installation

Single-node installation

The fully-distributed mode

Operating Hadoop and YARN clusters

Web interfaces of the Ecosystem

YARN and Hadoop Ecosystems

YARN and Hadoop Ecosystems

The Hadoop 2 release

A short introduction to Hadoop 1.x and MRv1

MRv1 versus MRv2

Understanding where YARN fits into Hadoop

Old and new MapReduce APIs

Backward compatibility of MRv2 APIs

Practical examples of MRv1 and MRv2

YARN Administration

YARN Administration

Container allocation

Container configurations

YARN scheduling policies

YARN multitenancy application support

Administration of YARN

Developing and Running a Simple YARN Application

Developing and Running a Simple YARN Application

Running sample examples on YARN

Monitoring YARN applications with web GUI

YARN's MapReduce support

The YARN application workflow

YARN Frameworks

YARN Frameworks

HOYA (HBase on YARN)

KOYA (Kafka on YARN)

Failures in YARN

Failures in YARN

ResourceManager failures

ApplicationMaster failures

NodeManager failures

Container failures

Hardware Failures

YARN – Alternative Solutions

YARN – Alternative Solutions

YARN – Future and Support

YARN – Future and Support

What YARN means to the big data industry

Journey – present and future

YARN-supported frameworks

Index

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Apache Spark

Apache Spark is a fast and general engine for large-scale data processing. It was originally developed in 2009 in UC Berkeley's AMPLab and open sourced in 2010.

The main features of Spark are as follows:

Speed: Spark enables applications in Hadoop clusters to run up to 100x faster in memory and 10x faster even when running on disk.
Ease of use: Spark lets you quickly write applications in Java, Scala, or Python. You can use it interactively to query big datasets from the Scala and Python shells.
Runs everywhere: Spark runs on Hadoop, Mesos, in standalone mode, or in the cloud. It can access diverse data sources, including HDFS, Cassandra, HBase, and S3. You can run Spark readily using its standalone cluster mode, on EC2, or run it on Hadoop YARN or Apache Mesos. It can read from HDFS, HBase, Cassandra, and any Hadoop data source.
Generality: Spark powers a stack of high-level tools, including Spark SQL, MLlib for machine learning, GraphX, and Spark Streaming. You can combine...