Book Image

Securing Hadoop

By : Sudheesh Narayan
Book Image

Securing Hadoop

By: Sudheesh Narayan

Overview of this book

Security of Big Data is one of the biggest concerns for enterprises today. How do we protect the sensitive information in a Hadoop ecosystem? How can we integrate Hadoop security with existing enterprise security systems? What are the challenges in securing Hadoop and its ecosystem? These are the questions which need to be answered in order to ensure effective management of Big Data. Hadoop, along with Kerberos, provides security features which enable Big Data management and which keep data secure. This book is a practitioner's guide for securing a Hadoop-based Big Data platform. This book provides you with a step-by-step approach to implementing end-to-end security along with a solid foundation of knowledge of the Hadoop and Kerberos security models. This practical, hands-on guide looks at the security challenges involved in securing sensitive data in a Hadoop-based Big Data platform and also covers the Security Reference Architecture for securing Big Data. It will take you through the internals of the Hadoop and Kerberos security models and will provide detailed implementation steps for securing Hadoop. You will also learn how the internals of the Hadoop security model are implemented, how to integrate Enterprise Security Systems with Hadoop security, and how you can manage and control user access to a Hadoop ecosystem seamlessly. You will also get acquainted with implementing audit logging and security incident monitoring within a Big Data platform.
Table of Contents (15 chapters)
Securing Hadoop
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Why do we need to secure Hadoop?


Enterprise data consists of crucial information related to sales, customer interactions, human resources, and so on, and is locked securely within systems such as ERP, CRM, and general ledger systems. In the last decade, enterprise data security has matured significantly as organizations learned their lessons from various data security incidents that caused them losses in billions. As the services industry has grown and matured, most of the systems are outsourced to vendors who deal with crucial client information most of the time. As a result, security and privacy standards such as HIPAA, HITECH, PCI, SOX, ISO, and COBIT have evolved . This requires service providers to comply with these regulatory standards to fully safeguard their client's data assets. This has resulted in a very protective data security enforcement within enterprises including service providers as well as the clients. There is absolutely no tolerance to data security violations. Over the last eight years of its development, Hadoop has now reached a mature state where enterprises have started adopting it for their Big Data processing needs. The prime use case is to gain strategic and operational advantages from their humongous data sets. However, to do any analysis on top of these datasets, we need to bring them to the Hadoop ecosystem for processing. So the immediate question that arises with respect to data security is, how secure is the data storage inside the Hadoop ecosystem?

The question is not just about securing the source data which is moved from the enterprise systems to the Hadoop ecosystem. Once these datasets land into the Hadoop ecosystems, analysts and data scientists perform large-scale analytics and machine-learning-based processing to derive business insights. These business insights are of great importance to the enterprise. Any such insights in the hands of the competitor or any unauthorized personnel could be disastrous to the business. It is these business insights that are highly sensitive and must be fully secured.

Any data security incident will cause business users to lose their trust in the ecosystem. Unless the business teams have confidence in the Hadoop ecosystem, they won't take the risk to invest in Big Data. Hence, the success and failure of Big Data-related projects really depends upon how secure our data ecosystem is going to be.