Book Image

YARN Essentials

Book Image

YARN Essentials

Overview of this book

Table of Contents (17 chapters)
YARN Essentials
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Free Chapter
1
Need for YARN
9
YARN – Alternative Solutions
Index

Apache Spark


Apache Spark is a fast and general engine for large-scale data processing. It was originally developed in 2009 in UC Berkeley's AMPLab and open sourced in 2010.

The main features of Spark are as follows:

  • Speed: Spark enables applications in Hadoop clusters to run up to 100x faster in memory and 10x faster even when running on disk.

  • Ease of use: Spark lets you quickly write applications in Java, Scala, or Python. You can use it interactively to query big datasets from the Scala and Python shells.

  • Runs everywhere: Spark runs on Hadoop, Mesos, in standalone mode, or in the cloud. It can access diverse data sources, including HDFS, Cassandra, HBase, and S3. You can run Spark readily using its standalone cluster mode, on EC2, or run it on Hadoop YARN or Apache Mesos. It can read from HDFS, HBase, Cassandra, and any Hadoop data source.

  • Generality: Spark powers a stack of high-level tools, including Spark SQL, MLlib for machine learning, GraphX, and Spark Streaming. You can combine...