Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying Cloudera Administration Handbook
  • Table Of Contents Toc
Cloudera Administration Handbook

Cloudera Administration Handbook

By : Menon
3.5 (10)
close
close
Cloudera Administration Handbook

Cloudera Administration Handbook

3.5 (10)
By: Menon

Overview of this book

An easy-to-follow Apache Hadoop administrator’s guide filled with practical screenshots and explanations for each step and configuration. This book is great for administrators interested in setting up and managing a large Hadoop cluster. If you are an administrator, or want to be an administrator, and you are ready to build and maintain a production-level cluster running CDH5, then this book is for you.
Table of Contents (11 chapters)
close
close
10
Index

Responsibilities of a Hadoop administrator

With the increase in the interest to derive insight on their big data, organizations are now planning and building their big data teams aggressively. To start working on their data, they need to have a good solid infrastructure. Once they have this setup, they need several controls and system policies in place to maintain, manage, and troubleshoot their cluster.

There is an ever-increasing demand for Hadoop Administrators in the market as their function (setting up and maintaining Hadoop clusters) is what makes analysis really possible.

The Hadoop administrator needs to be very good at system operations, networking, operating systems, and storage. They need to have a strong knowledge of computer hardware and their operations, in a complex network.

Apache Hadoop, mainly, runs on Linux. So having good Linux skills such as monitoring, troubleshooting, configuration, and security is a must.

Setting up nodes for clusters involves a lot of repetitive tasks and the Hadoop administrator should use quicker and efficient ways to bring up these servers using configuration management tools such as Puppet, Chef, and CFEngine. Apart from these tools, the administrator should also have good capacity planning skills to design and plan clusters.

There are several nodes in a cluster that would need duplication of data, for example, the fsimage file of the namenode daemon can be configured to write to two different disks on the same node or on a disk on a different node. An understanding of NFS mount points and how to set it up within a cluster is required. The administrator may also be asked to set up RAID for disks on specific nodes.

As all Hadoop services/daemons are built on Java, a basic knowledge of the JVM along with the ability to understand Java exceptions would be very useful. This helps administrators identify issues quickly.

The Hadoop administrator should possess the skills to benchmark the cluster to test performance under high traffic scenarios.

Clusters are prone to failures as they are up all the time and are processing large amounts of data regularly. To monitor the health of the cluster, the administrator should deploy monitoring tools such as Nagios and Ganglia and should configure alerts and monitors for critical nodes of the cluster to foresee issues before they occur.

Knowledge of a good scripting language such as Python, Ruby, or Shell would greatly help the function of an administrator. Often, administrators are asked to set up some kind of a scheduled file staging from an external source to HDFS. The scripting skills help them execute these requests by building scripts and automating them.

Above all, the Hadoop administrator should have a very good understanding of the Apache Hadoop architecture and its inner workings.

The following are some of the key Hadoop-related operations that the Hadoop administrator should know:

  • Planning the cluster, deciding on the number of nodes based on the estimated amount of data the cluster is going to serve.
  • Installing and upgrading Apache Hadoop on a cluster.
  • Configuring and tuning Hadoop using the various configuration files available within Hadoop.
  • An understanding of all the Hadoop daemons along with their roles and responsibilities in the cluster.
  • The administrator should know how to read and interpret Hadoop logs.
  • Adding and removing nodes in the cluster.
  • Rebalancing nodes in the cluster.
  • Employ security using an authentication and authorization system such as Kerberos.
  • Almost all organizations follow the policy of backing up their data and it is the responsibility of the administrator to perform this activity. So, an administrator should be well versed with backups and recovery operations of servers.
CONTINUE READING
83
Tech Concepts
36
Programming languages
73
Tech Tools
Icon Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.
Icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Icon 50+ new titles added per month and exclusive early access to books as they are being written.
Cloudera Administration Handbook
notes
bookmark Notes and Bookmarks search Search in title playlist Add to playlist download Download options font-size Font size

Change the font size

margin-width Margin width

Change margin width

day-mode Day/Sepia/Night Modes

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY

Submit Your Feedback

Modal Close icon
Modal Close icon
Modal Close icon