Book Image

10 Machine Learning Blueprints You Should Know for Cybersecurity

By : Rajvardhan Oak
4 (1)
Book Image

10 Machine Learning Blueprints You Should Know for Cybersecurity

4 (1)
By: Rajvardhan Oak

Overview of this book

Machine learning in security is harder than other domains because of the changing nature and abilities of adversaries, high stakes, and a lack of ground-truth data. This book will prepare machine learning practitioners to effectively handle tasks in the challenging yet exciting cybersecurity space. The book begins by helping you understand how advanced ML algorithms work and shows you practical examples of how they can be applied to security-specific problems with Python – by using open source datasets or instructing you to create your own. In one exercise, you’ll also use GPT 3.5, the secret sauce behind ChatGPT, to generate an artificial dataset of fabricated news. Later, you’ll find out how to apply the expert knowledge and human-in-the-loop decision-making that is necessary in the cybersecurity space. This book is designed to address the lack of proper resources available for individuals interested in transitioning into a data scientist role in cybersecurity. It concludes with case studies, interview questions, and blueprints for four projects that you can use to enhance your portfolio. By the end of this book, you’ll be able to apply machine learning algorithms to detect malware, fake news, deep fakes, and more, along with implementing privacy-preserving machine learning techniques such as differentially private ML.
Table of Contents (15 chapters)

What this book covers

Chapter 1, On Cybersecurity and Machine Learning, introduces you to the fundamental principles of cybersecurity and how it has evolved, as well as basic concepts in machine learning. It will also discuss the challenges and importance of applying machine learning to the security space.

Chapter 2, Detecting Suspicious Activity, describes the basic cybersecurity problems: detecting intrusions and suspicious behavior that indicates attacks. It will cover statistical and machine learning techniques for anomaly detection.

Chapter 3, Malware Detection Using Transformers and BERT, discusses malware and its variants. A state-of-the-art model, BERT, is used to frame malware detection as an NLP task to build a high-performance classifier with a small amount of malware data. The chapter also covers theoretical details on attention and the transformer model.

Chapter 4, Detecting Fake Reviews, covers techniques for building models for fraudulent review detection. This chapter covers statistical analysis methods such as t-tests to determine which features are statistically different between real and fake reviews. Furthermore, it describes how regression can help model this data and how the results of regression should be interpreted.

Chapter 5, Detecting Deepfakes, discusses deepfake images and videos, which have recently taken the internet by storm. The chapter covers how deepfakes are generated, the social implications they can have, and how machine learning can be used to detect deepfake images and videos.

Chapter 6, Detecting Machine-Generated Text, extends deepfakes into the text domain and covers bot-generated text detection. It first outlines a methodology for generating a custom fake news dataset using GPT, followed by techniques for generating features, and finally, using machine learning to detect text that is bot-generated.

Chapter 7, Attributing Authorship and How to Evade it, talks about the task of authorship attribution, which is important in social media and intellectual privacy domains. The chapter also explores the counter-task – that is, evading authorship attribution – and how that can be achieved to maintain privacy when needed.

Chapter 8, Detecting Fake News with Graph Neural Networks, tackles an important issue in today’s world – that of misinformation and fake news. It uses the advanced modeling technique of graph neural networks, explains the theory behind it, and applies it to fake news detection on Twitter.

Chapter 9, Attacking Models with Adversarial Machine Learning, covers security issues related to machine learning models – for example, how a model can be degraded due to data poisoning or how a model can be fooled into giving out an incorrect prediction. You will learn about attack techniques to fool image and text classification models.

Chapter 10, Protecting User Privacy with Differential Privacy, introduces users to differential privacy, a paradigm widely adopted in the technology industry. It also covers the fundamental concepts of privacy, both technical and legal. You will learn how to train fraud detection models in a differentially private manner, and the costs and benefits it brings.

Chapter 11, Protecting User Privacy with Federated Machine Learning, covers a collaborative machine learning technique where multiple entities can co-train a model without having to share any training data. The chapter presents an example of how a deep neural network can be trained in a federated fashion.

Chapter 12, Breaking into the Sec-ML Industry, provides a wealth of resources for you to apply all that you have learned so far and prepare you for interviews in the Sec-ML space. It contains resources for further reading, a question bank for interviews, and blueprints for projects you can build out on your own.