Security becomes a primary feature in a multitenant and distributed environment. Networks and resource sharing can potentially lead to information leaks via unauthorized access, malicious modifications, or even denial of service. Preempting attacks on Hadoop clusters can be done by enabling security features such as authentication, authorization, data protection, and data auditing.
The key takeaways from this chapter are as follows:
Post 0.20, Yahoo! introduced Hadoop security-related features for compliance, confidentiality, and fair usage in shared enterprise clusters.
Hadoop can now be configured for Kerberos-based authentication or simple authentication based on the topology and compliance requirements. User information can be retrieved from enterprise user stores such as LDAP or Active Directory.
Hadoop has both service-level and resource-level authorization built in. HDFS authorization is very similar to the UNIX-based file authorization model.
Hadoop provides data confidentiality...