Book Image

Mastering Machine Learning for Penetration Testing

By : Chiheb Chebbi
Book Image

Mastering Machine Learning for Penetration Testing

By: Chiheb Chebbi

Overview of this book

Cyber security is crucial for both businesses and individuals. As systems are getting smarter, we now see machine learning interrupting computer security. With the adoption of machine learning in upcoming security products, it’s important for pentesters and security researchers to understand how these systems work, and to breach them for testing purposes. This book begins with the basics of machine learning and the algorithms used to build robust systems. Once you’ve gained a fair understanding of how security products leverage machine learning, you'll dive into the core concepts of breaching such systems. Through practical use cases, you’ll see how to find loopholes and surpass a self-learning security system. As you make your way through the chapters, you’ll focus on topics such as network intrusion detection and AV and IDS evasion. We’ll also cover the best practices when identifying ambiguities, and extensive techniques to breach an intelligent system. By the end of this book, you will be well-versed with identifying loopholes in a self-learning security system and will be able to efficiently breach a machine learning system.
Table of Contents (13 chapters)

How to build a Twitter bot detector

In the previous sections, we saw how to build a machine learning-based botnet detector. In this new project, we are going to deal with a different problem instead of defending against botnet malware. We are going to detect Twitter bots because they are also dangerous and can perform malicious actions. For the model, we are going to use the NYU Tandon Spring 2017 Machine Learning Competition: Twitter Bot classification dataset. You can download it from this link: https://www.kaggle.com/c/twitter-bot-classification/data. Import the required Python packages:

>>> import pandas as pd
>>> import numpy as np
>>> import seaborn

Let's load the data using pandas and highlight the bot and non-bot data:

>>> data = pd.read_csv('training_data_2_csv_UTF.csv')
>>> Bots = data[data.bot==1]
>> NonBots...