Book Image

Securing Hadoop

By : Sudheesh Narayan
Book Image

Securing Hadoop

By: Sudheesh Narayan

Overview of this book

Security of Big Data is one of the biggest concerns for enterprises today. How do we protect the sensitive information in a Hadoop ecosystem? How can we integrate Hadoop security with existing enterprise security systems? What are the challenges in securing Hadoop and its ecosystem? These are the questions which need to be answered in order to ensure effective management of Big Data. Hadoop, along with Kerberos, provides security features which enable Big Data management and which keep data secure. This book is a practitioner's guide for securing a Hadoop-based Big Data platform. This book provides you with a step-by-step approach to implementing end-to-end security along with a solid foundation of knowledge of the Hadoop and Kerberos security models. This practical, hands-on guide looks at the security challenges involved in securing sensitive data in a Hadoop-based Big Data platform and also covers the Security Reference Architecture for securing Big Data. It will take you through the internals of the Hadoop and Kerberos security models and will provide detailed implementation steps for securing Hadoop. You will also learn how the internals of the Hadoop security model are implemented, how to integrate Enterprise Security Systems with Hadoop security, and how you can manage and control user access to a Hadoop ecosystem seamlessly. You will also get acquainted with implementing audit logging and security incident monitoring within a Big Data platform.
Table of Contents (15 chapters)
Securing Hadoop
About the Author
About the Reviewers

Automation of a secured Hadoop cluster deployment

Let us have a look at some of the most important tools.

Cloudera Manager

Cloudera Manager is another of the most popular Hadoop Management and Deployment Tool. Some of the key features of Cloudera Manager with respect to securing a Hadoop Cluster are:

  • Cloudera Manager automates the entire Hadoop cluster setup and enables an automated setup of a secure Hadoop cluster with Kerberos. Cloudera Manager automatically sets up the Keytab file in all the slave nodes, and updates the Hadoop configuration with the required Keytab locations and service principal details. Cloudera Manager updates the configuration files (core-site.xml, hdfs-site.xml, mapred-site.xml, oozie-site.xml, hue.ini, and taskcontroller.cfg) without any manual intervention.

  • It supports the deployment of a role-based administration, where there are read-only administrators who monitor the cluster while others can change the deployments.

  • It enables administrators to configure alerts specific to user activity and access. This can be leveraged to security incidents and event monitoring.

  • Cloudera can send events to enterprise SIEM tools about security incidents in Hadoop using SNMP.

  • It can integrate user credentials using LDAP with Active Directory.


    More details on Cloudera Manager are available at the following URL:


Zettaset ( provides a product Zettaset Orchestrator that provides seamless secured Hadoop deployment and management. Zettaset doesn't provide any Hadoop distribution, but works with all distributions such as Cloudera, Hortonworks, and Apache Hadoop. Some of the key features of the Zettaset Orchestrator are:

  • It provides an automated deployment of a secured Hadoop cluster

  • It hardens the entire Hadoop deployment from an enterprise perspective to address policy, compliance, access control, and risk management within the Hadoop cluster environment

  • It integrates seamlessly with an existing enterprise security policy framework using LDAP and Active Directory (AD)

  • It provides centralized configuration management, logging, and auditing

  • It provides role-based access controls (RBACs) and enables Kerberos to be seamlessly integrated with the rest of the ecosystem

All other platform management tools such as Ambari and Greenplum Hadoop Deployment Manager need manual setup for establishing a secured Hadoop cluster. The Keytab files, service principals, and the configuration files have to be manually deployed on all nodes.