What is security? We could work on data security forever; this part of IT infrastructure is infinite. Companies are usually interested in these aspects:
User/group/access control list-based access to data: Administrators should have something similar to the read/write/execute in Linux. We can set who owns the data and who can read/write it.
Audit/log access to data: We need to know who got access to data, and when and how.
Isolation: We don't want our data to be publicly accessible. We would like to set access to clusters from specific subnets, for example. We are not going to try to set up all the functionality required for production-ready security. Our aim is to set simple security to prevent unauthorized users from accessing the data.
What is data security? It consists of three major parts:
Authentication
Authorization
Audit
These three properties give us a clue as to who did something somewhere, and which privileges were used. Hadoop security setup is still...