Book Image

Machine Learning for Cybersecurity Cookbook

By : Emmanuel Tsukerman

Book Image

Machine Learning for Cybersecurity Cookbook

By: Emmanuel Tsukerman

Overview of this book

Organizations today face a major threat in terms of cybersecurity, from malicious URLs to credential reuse, and having robust security systems can make all the difference. With this book, you'll learn how to use Python libraries such as TensorFlow and scikit-learn to implement the latest artificial intelligence (AI) techniques and handle challenges faced by cybersecurity researchers. You'll begin by exploring various machine learning (ML) techniques and tips for setting up a secure lab environment. Next, you'll implement key ML algorithms such as clustering, gradient boosting, random forest, and XGBoost. The book will guide you through constructing classifiers and features for malware, which you'll train and test on real samples. As you progress, you'll build self-learning, reliant systems to handle cybersecurity tasks such as identifying malicious URLs, spam email detection, intrusion detection, network protection, and tracking user and process behavior. Later, you'll apply generative adversarial networks (GANs) and autoencoders to advanced security tasks. Finally, you'll delve into secure and private AI to protect the privacy rights of consumers using your ML models. By the end of this book, you'll have the skills you need to tackle real-world problems faced in the cybersecurity domain using a recipe-based approach.

Preface

Who this book is for

What this book covers

To get the most out of this book

Free Chapter

Machine Learning for Cybersecurity

Machine Learning for Cybersecurity

Technical requirements

Train-test-splitting your data

Standardizing your data

Summarizing large data using principal component analysis

Generating text using Markov chains

Performing clustering using scikit-learn

Training an XGBoost classifier

Analyzing time series using statsmodels

Anomaly detection with Isolation Forest

Natural language processing using a hashing vectorizer and tf-idf with scikit-learn

Hyperparameter tuning with scikit-optimize

Machine Learning-Based Malware Detection

Machine Learning-Based Malware Detection

Technical requirements

Malware static analysis

Malware dynamic analysis

Using machine learning to detect the file type

Measuring the similarity between two strings

Measuring the similarity between two files

Extracting N-grams

Selecting the best N-grams

Building a static malware detector

Tackling class imbalance

Handling type I and type II errors

Advanced Malware Detection

Advanced Malware Detection

Technical requirements

Detecting obfuscated JavaScript

Featurizing PDF files

Extracting N-grams quickly using the hash-gram algorithm

Building a dynamic malware classifier

MalConv – end-to-end deep learning for malicious PE detection

Tackling packed malware

MalGAN – creating evasive malware

Tracking malware drift

Machine Learning for Social Engineering

Machine Learning for Social Engineering

Technical requirements

Twitter spear phishing bot

Voice impersonation

Speech recognition for OSINT

Facial recognition

Deepfake recognition

Lie detection using machine learning

Personality analysis

Fake review generator

Penetration Testing Using Machine Learning

Penetration Testing Using Machine Learning

Technical requirements

CAPTCHA breaker

Neural network-assisted fuzzing

Web server vulnerability scanner using machine learning (GyoiThon)

Deanonymizing Tor using machine learning

IoT device type identification using machine learning

Keystroke dynamics

Malicious URL detector

Deep learning-based system for the automatic detection of software vulnerabilities

Automatic Intrusion Detection

Automatic Intrusion Detection

Technical requirements

Spam filtering using machine learning

Phishing URL detection

Capturing network traffic

Network behavior anomaly detection

Botnet traffic detection

Insider threat detection

Credit card fraud detection

Counterfeit bank note detection

Ad blocking using machine learning

Wireless indoor localization

Securing and Attacking Data with Machine Learning

Securing and Attacking Data with Machine Learning

Technical requirements

Assessing password security using ML

Deep learning for password cracking

Deep steganography

ML-based steganalysis

ML attacks on PUFs

Encryption using deep learning

HIPAA data breaches – data exploration and visualization

Secure and Private AI

Secure and Private AI

Technical requirements

Federated learning

Encrypted computation

Private deep learning prediction

Testing the adversarial robustness of neural networks

Differential privacy using TensorFlow Privacy

Other Books You May Enjoy

Other Books You May Enjoy

Leave a review - let other readers know what you think

Appendix

Setting up a virtual lab environment

Using Python virtual environments

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Featurizing PDF files

In this section, we will see how to featurize PDF files in order to use them for machine learning. The tool we will be utilizing is the PDFiD Python script designed by Didier Stevens (https://blog.didierstevens.com/). Stevens selected a list of 20 features that are commonly found in malicious files, including whether the PDF file contains JavaScript or launches an automatic action. It is suspicious to find these features in a file, hence, the appearance of these can be indicative of malicious behavior.

Essentially, the tool scans through a PDF file, and counts the number of occurrences of each of the ~20 features. A run of the tool appears as follows:

 PDFiD 0.2.5 PythonBrochure.pdf

 PDF Header: %PDF-1.6
 obj                 1096
 endobj              1095
 stream              1061
 endstream           1061
 xref                   0
 trailer                0
 startxref...