Summary
In this chapter, we have learned about the different security practices in big data architectures both for batch and streaming data. We have examined the different components involved along with the messages exchanged to set up authorization and authentication processes in a Hadoop ecosystem. We further extended the scope to understand how model training pipelines can be made to fit in a scalable architecture by analyzing design strategies for adversarial model training. We explored concepts including retraining from scratch, continued training, and two-stage continued training to deep dive into concepts such as privacy-enabled retraining. Our examination of the design of secure ML-based microservices gave us insights into how to embed layers of security with individual microservices and in situations when one microservice is dependent on sensitive data from another microservice.
When we talked about privacy-enabled training, we investigated how to run scalable DP-based...