Book Image

Securing Hadoop

By : Sudheesh Narayan
Book Image

Securing Hadoop

By: Sudheesh Narayan

Overview of this book

Security of Big Data is one of the biggest concerns for enterprises today. How do we protect the sensitive information in a Hadoop ecosystem? How can we integrate Hadoop security with existing enterprise security systems? What are the challenges in securing Hadoop and its ecosystem? These are the questions which need to be answered in order to ensure effective management of Big Data. Hadoop, along with Kerberos, provides security features which enable Big Data management and which keep data secure. This book is a practitioner's guide for securing a Hadoop-based Big Data platform. This book provides you with a step-by-step approach to implementing end-to-end security along with a solid foundation of knowledge of the Hadoop and Kerberos security models. This practical, hands-on guide looks at the security challenges involved in securing sensitive data in a Hadoop-based Big Data platform and also covers the Security Reference Architecture for securing Big Data. It will take you through the internals of the Hadoop and Kerberos security models and will provide detailed implementation steps for securing Hadoop. You will also learn how the internals of the Hadoop security model are implemented, how to integrate Enterprise Security Systems with Hadoop security, and how you can manage and control user access to a Hadoop ecosystem seamlessly. You will also get acquainted with implementing audit logging and security incident monitoring within a Big Data platform.
Table of Contents (15 chapters)
Securing Hadoop
About the Author
About the Reviewers

Key security considerations

As discussed previously, to meet the enterprise data security needs for a Big Data ecosystem, a complex and holistic approach is needed to secure the entire ecosystem. Some of the key security considerations while securing Hadoop-based Big Data ecosystem are:

  • Authentication: There is a need to provide a single point for authentication that is aligned and integrated with existing enterprise identity and access management system.

  • Authorization: We need to enforce a role-based authorization with fine-grained access control for providing access to sensitive data.

  • Access control: There is a need to control who can do what on a dataset, and who can use how much of the processing capacity available in the cluster.

  • Data masking and encryption: We need to deploy proper encryption and masking techniques on data to ensure secure access to sensitive data for authorized personnel.

  • Network perimeter security: We need to deploy perimeter security for the overall Hadoop ecosystem that controls how the data can move in and move out of the ecosystem to other infrastructures. Design and implement the network topology to provide proper isolation of the Big Data ecosystem from the rest of the enterprise. Provide proper network-level security by configuring the appropriate firewall rules to prevent unauthorized traffic.

  • System security: There is a need to provide system-level security by hardening the OS and the applications that are installed as part of the ecosystem. Address all the known vulnerability of OS and applications.

  • Infrastructure security: We need to enforce strict infrastructure and physical access security in the data center.

  • Audits and event monitoring: A proper audit trial is required for any changes to the data ecosystem and provide audit reports for various activities (data access and data processing) that occur within the ecosystem.

Reference architecture for Big Data security

Implementing all the preceding security considerations for the enterprise data security becomes very vital to building a trusted Big Data ecosystem within the enterprise. The following figure shows as a typical Big Data ecosystem and how various ecosystem components and stakeholders interact with each other. Implementing the security controls in each of these interactions requires elaborate planning and careful execution.

The reference architecture depicted in the following diagram summarizes the key security pillars that needs to be considered for securing a Big Data ecosystem. In the next chapters, we will explore how to leverage the Hadoop security model and the various existing enterprise tools to secure the Big Data ecosystem.

In Chapter 4, Securing the Hadoop Ecosystem, we will look at the implementation details to secure the OS and applications that are deployed along with Hadoop in the ecosystem. In Chapter 5, Integrating Hadoop with Enterprise Security Systems, we look at the corporate network perimeter security requirement and how to secure the cluster and look at how authorization defined within the enterprise identity management system can be integrated with the Hadoop ecosystem. In Chapter 6, Securing Sensitive Data in Hadoop, we look at the encryption implementation for securing sensitive data in Hadoop. In Chapter 7, Security Event and Audit Logging in Hadoop, we look at security incidents and event monitoring along with the security policies required to address the audit and reporting requirements.