Throughout this book, we have visited many areas of data science, often straying into those that are not traditionally associated with a data scientist's core working knowledge. In particular, we dedicated an entire chapter, Chapter 2, Data Acquisition, to data ingestion, which explains how to solve an issue that is always present, but rarely acknowledged or addressed adequately. In this chapter, we will visit another of those often overlooked fields, secure data. More specifically, how to protect your data and analytic results at all stages of the data life cycle. This ranges from ingestion, right through to presentation, at all times considering the important architectural and scalability requirements that naturally form the Spark paradigm.
In this chapter, we will cover the following topics:
How to implement coarse-grained data access controls using HDFS ACLs
A guide to fine-grained security, with explanations using the Hadoop ecosystem
How to ensure data is always...