Adversarial attacks in ML refer to fooling a model by feeding input with the purpose of deceiving it. Examples of such attacks include adding perturbations to an image by changing a few pixels, thereby causing the classifier to misclassify the sample, or carrying t-shirts with certain patterns to evade person detectors (adversarial t-shirts). One particular kind of adversarial attack is a privacy attack, where a hacker can gain knowledge of the training dataset of the model, potentially exposing personal or sensitive information by membership inference attacks and model inversion attacks.
Privacy attacks are dangerous, particularly in domains such as medical or financial, where the training data can involve sensitive information (for example, a health status) and that is possibly traceable to an individual's identity. In this recipe, we'll build a model that is safe against privacy attacks, and therefore cannot be hacked.