Book Image

Machine Learning Security Principles

By : John Paul Mueller
Book Image

Machine Learning Security Principles

By: John Paul Mueller

Overview of this book

Businesses are leveraging the power of AI to make undertakings that used to be complicated and pricy much easier, faster, and cheaper. The first part of this book will explore these processes in more depth, which will help you in understanding the role security plays in machine learning. As you progress to the second part, you’ll learn more about the environments where ML is commonly used and dive into the security threats that plague them using code, graphics, and real-world references. The next part of the book will guide you through the process of detecting hacker behaviors in the modern computing environment, where fraud takes many forms in ML, from gaining sales through fake reviews to destroying an adversary’s reputation. Once you’ve understood hacker goals and detection techniques, you’ll learn about the ramifications of deep fakes, followed by mitigation strategies. This book also takes you through best practices for embracing ethical data sourcing, which reduces the security risk associated with data. You’ll see how the simple act of removing personally identifiable information (PII) from a dataset lowers the risk of social engineering attacks. By the end of this machine learning book, you'll have an increased awareness of the various attacks and the techniques to secure your ML systems effectively.
Table of Contents (19 chapters)
1
Part 1 – Securing a Machine Learning System
5
Part 2 – Creating a Secure System Using ML
12
Part 3 – Protecting against ML-Driven Attacks
15
Part 4 – Performing ML Tasks in an Ethical Manner

Setting up for the book

I want to ensure that you have the best possible experience when working through the examples in this book. To accomplish that task, this book relies on the literate programming technique originally explored by Donald Knuth and detailed in his paper at http://www.literateprogramming.com/knuthweb.pdf. The crux of this approach is that it provides you with a notebook-like environment in which to work where it’s possible to freely mix code and non-code elements, including graphics. Because of its reliance on multiple methods of conveying information, this approach is exceptionally clear and easy to understand. Plus, it promotes experimentation at a level that many people don’t experience using other approaches.

No matter how inviting a programming environment might be, however, you still have to have a specific level of knowledge to enjoy it. The first section that follows describes what you need to know to use the book successfully. Because of the programming environment I’ve chosen to use, those requirements may be fewer than expected.

It’s also critical that you use the same tools that I used in creating the examples. This requirement isn’t meant to hinder you in any way, but to ensure that you don’t spend a lot of time overcoming environmental issues while attempting to run the code. The second section that follows describes the programming setup I used so that you can replicate it on your system.

To ensure that you don’t have to battle typos and other problems with hand-typed code, I also provide a downloadable source that makes it incredibly easy to work with the programming examples. Most people do benefit from eventually typing their own code and creating their own examples, but to make the learning process easier, you really do want to use the downloadable source if at all possible. The blog post at http://blog.johnmuellerbooks.com/2014/01/10/verifying-your-hand-typed-code/ provides you with some additional details in this regard. You can obtain the downloadable source code for this book from the publisher’s GitHub site at https://github.com/PacktPublishing/Machine-Learning-Security-Principles or my website at http://www.johnmuellerbooks.com/source-code/.

What do you need to know?

The main audience for this book is data scientists and, to a lesser extent, researchers, so I’m assuming that you already know something about data sources, data management techniques, and the algorithms used to perform analysis on data. I don’t expect you to have an advanced degree in these topics, but you should know that a .csv file contains data that is separated in fields using commas. In addition, it would be helpful to have at least a passing knowledge of common algorithms such as Bayes’ theorem. The notes and references we provide in the book will help you locate the additional information you need, but this book doesn’t provide a tutorial on essential data science topics.

To provide the best possible programming environment, this book also relies on the Python programming language. Again, you won’t find a tutorial on this language here, but the use of the literate programming technique should aid in your understanding if you have worked with programming languages in the past. Obviously, the more you know about Python, the less effort you’ll need to expend on understanding the code. People who are in management and don’t really want to get into the coding details will still find this book useful for the theory it provides, so you could possibly work with the book without knowing anything about Python to obtain theoretical knowledge.

It’s also essential that you know how to work with whatever platform you’re using. You need to know how to install software, work with the filesystem, and perform other general user tasks with whatever platform you choose to use. Fortunately, you have lots of options for using Jupyter Notebook, the recommended IDE for this book, or Google Colab, a great alternative that will work with your mobile device. However, this extensive list of platforms also means that we can’t provide you with much in the way of platform support.

Considering the programming setup

To get the best results from a book’s source code, you need to use the same development products as the book’s author. Otherwise, you can’t be sure whether an error you find is a bug in the development product or from the source code. The example code in this book is tested using both Jupyter Notebook (for desktop systems) (https://jupyter.org/) and Google Colab (for tablet users) (https://colab.research.google.com/notebooks/welcome.ipynb). Desktop system users will benefit greatly from using Jupyter Notebook, especially if they have limited access to a broadband connection. Whichever product you use, the code is tested using Python version 3.8.3, although any Python 3.7 or 3.8 version will work fine. Newer versions of Python tend to create problems with libraries used with the example code because the vendors who create the libraries don’t necessarily update them at the same speed as Python is updated. You can read about these changes at https://docs.python.org/3/whatsnew/3.8.html. You can check your Python version using the following code:

import sys
print('Python Version:\n', sys.version)

I highly recommend using a multi-product toolkit called Anaconda (https://www.anaconda.com/products/individual), which includes Jupyter Notebook and a number of tools, such as conda, for installing libraries with fewer headaches. Figure 1.5 shows some of the tools you get with Anaconda. I wrote the examples using the 2020.07 version of Anaconda, which you can obtain at https://repo.anaconda.com/archive/. Make sure you get the right file for your programming platform:

  • Anaconda3-2020.07-Linux-ppc64le.sh (PowerPC) or Anaconda3-2020.07-Linux-x86_64.sh for Linux
  • Anaconda3-2020.07-MacOSX-x86_64.pkg or Anaconda3-2020.07-MacOSX-x86_64.sh for macOS
  • Anaconda3-2020.07-Windows-x86.exe (32-bit) or Anaconda3-2020.07-Windows-x86_64.exe (64-bit) for Windows
Figure 1.5 – Anaconda provides you with access to a wide variety of tools

Figure 1.5 – Anaconda provides you with access to a wide variety of tools

It’s possible to test your Anaconda version using the following code (which won’t work on Google Colab since it doesn’t have Anaconda installed):

import os
result = os.popen('conda list anaconda$').read()
print('\nAnaconda Version:\n', result)

The examples rely on a number of libraries, but three libraries are especially critical. If you don’t have the right version installed, the examples won’t work:

  • NumPy: Version 1.18.5 or greater
  • scikit-learn: Version 0.23.1 or greater
  • pandas: Version 1.1.3 or greater

Use this code to check your library versions:

!pip show numpy
!pip show scikit-learn
!pip show pandas

Now that you have a workable development environment, it’s time to begin working through some example code in the chapters that follow.