Book Image

Cloudera Administration Handbook

By : Menon
Book Image

Cloudera Administration Handbook

By: Menon

Overview of this book

An easy-to-follow Apache Hadoop administrator’s guide filled with practical screenshots and explanations for each step and configuration. This book is great for administrators interested in setting up and managing a large Hadoop cluster. If you are an administrator, or want to be an administrator, and you are ready to build and maintain a production-level cluster running CDH5, then this book is for you.
Table of Contents (11 chapters)
10
Index

Understanding the CDH components

As mentioned earlier, there are several top-level Apache open source projects that are part of CDH. Let's discuss these components in detail.

Apache Hadoop

CDH comes with Apache Hadoop, a system that we have already been introduced to, for high-volume storage and computing. The subcomponents that are part of Hadoop are HDFS, Fuse-DFS, MapReduce, and MapReduce 2 (YARN). Fuse-DFS is a module that helps to mount HDFS to the user space. Once mounted, HDFS will be accessible like any other traditional filesystem.

Apache Flume NG

Apache Flume NG Version 1.x is a distributed framework that handles the collection and aggregation of large amounts of log data. This project was primarily built to handle streaming data. Flume is robust, reliable, and fault tolerant. Though Flume was built to handle the streaming of log data, its flexibility when handling multiple data sources makes it easy to configure it to handle event data. Flume can handle almost any kind of data...