Book Image

Mastering Mesos

By : Dipa Dubhashi, Akhil Das
Book Image

Mastering Mesos

By: Dipa Dubhashi, Akhil Das

Overview of this book

Apache Mesos is open source cluster management software that provides efficient resource isolations and resource sharing distributed applications or frameworks. This book will take you on a journey to enhance your knowledge from amateur to master level, showing you how to improve the efficiency, management, and development of Mesos clusters. The architecture is quite complex and this book will explore the difficulties and complexities of working with Mesos. We begin by introducing Mesos, explaining its architecture and functionality. Next, we provide a comprehensive overview of Mesos features and advanced topics such as high availability, fault tolerance, scaling, and efficiency. Furthermore, you will learn to set up multi-node Mesos clusters on private and public clouds. We will also introduce several Mesos-based scheduling and management frameworks or applications to enable the easy deployment, discovery, load balancing, and failure handling of long-running services. Next, you will find out how a Mesos cluster can be easily set up and monitored using the standard deployment and configuration management tools. This advanced guide will show you how to deploy important big data processing frameworks such as Hadoop, Spark, and Storm on Mesos and big data storage frameworks such as Cassandra, Elasticsearch, and Kafka.
Table of Contents (16 chapters)
Mastering Mesos
Credits
About the Authors
About the Reviewer
www.PacktPub.com
Preface
Index

Hadoop on Mesos


This section will introduce Hadoop, explain how to set up the Hadoop stack on Mesos, and discuss the problems commonly encountered while setting up the stack.

Introduction to Hadoop

Hadoop was developed by Mike Cafarella and Doug Cutting in 2006 to manage the distribution for the Nutch project. The project was named after Doug's son's toy elephant.

The following modules make up the Apache Hadoop framework:

  • Hadoop Common: This has the common libraries and utilities required by other modules

  • Hadoop Distributed File System (HDFS): This is a distributed, scalable filesystem capable of storing petabytes of data on commodity hardware

  • Hadoop YARN: This is a resource manager to manage cluster resources (similar to Mesos)

  • Hadoop MapReduce: This is a processing model for parallel data processing at scale

MapReduce

MapReduce is a processing model using which large amounts of data can be processed in parallel on a distributed, commodity hardware-based infrastructure reliably and in a fault...