Book Image

Hands-On Penetration Testing with Python

By : Furqan Khan
Book Image

Hands-On Penetration Testing with Python

By: Furqan Khan

Overview of this book

With the current technological and infrastructural shift, penetration testing is no longer a process-oriented activity. Modern-day penetration testing demands lots of automation and innovation; the only language that dominates all its peers is Python. Given the huge number of tools written in Python, and its popularity in the penetration testing space, this language has always been the first choice for penetration testers. Hands-On Penetration Testing with Python walks you through advanced Python programming constructs. Once you are familiar with the core concepts, you’ll explore the advanced uses of Python in the domain of penetration testing and optimization. You’ll then move on to understanding how Python, data science, and the cybersecurity ecosystem communicate with one another. In the concluding chapters, you’ll study exploit development, reverse engineering, and cybersecurity use cases that can be automated with Python. By the end of this book, you’ll have acquired adequate skills to leverage Python as a helpful tool to pentest and secure infrastructure, while also creating your own custom exploits.
Table of Contents (18 chapters)

Why Python?

When we think about exploring a new programming language or technology, we often wonder about the scope of the new technology and how it might benefit us. Let's start this chapter by thinking about why we might want to use Python and what advantages it might give us.

To answer this question, we are going to think about current technology trends and not get into more language-specific features, such as the fact that it is object-oriented, functional, portable, and interpreted. We have heard these terms before. Let's try to think about why we might use Python from a strictly industrial standpoint, what the present and future landscapes of this language might look like, and how the language can serve us. We'll start by mentioning a few career options that someone involved in computer science might opt for:

  • Programmer or software developer
  • Web developer
  • Database engineer
  • Cyber security professional (penetration tester, incident responder, SOC analyst, malware analyst, security researcher, and so on)
  • Data scientist
  • Network engineer

There are many other roles as well, but we'll just focus on the most generic options for the time being to see how Python fits into them. Let's start off with the role of programmer or software developer. As of 2018, Python was recorded as the second most popular language listed in job adverts (https://www.codingdojo.com/blog/7-most-in-demand-programming-languages-of-2018/). The role of programmer might vary from company to company, but as a Python programmer, you might be making a software product written in Python, developing a cyber security tool written in Python (there are tons of these already in existence that can be found on GitHub and elsewhere in the cyber security community), prototyping a robot that can mimic humans, engineering a smart home automation product or utility, and so on. The scope of Python covers every dimension of software development, from typical software applications to robust hardware products. The reason for this is the ease of the language to understand, the power of the language in terms of its excellent library support, which is backed by a huge community, and, of course, the beauty of it being open source.

Let's move on to the web. In recent years, Python has done remarkably well in terms of its maturity as a web development language. The most popular full stack web-based frameworks such as Django, Flask, and CherryPy have made web development with Python a seamless and clean experience, with lots of learning, customization, and flexibility on the way. My personal favorite is Django, as it provides a very clean MVC architecture, where business, logic, and presentation layers are completely isolated, making the development code much cleaner and easier to manage. With all batteries loaded and support for ORM and out-the-box support for background task processing with celery, Django does everything that any other web framework would be capable of doing, while keeping the native code in Python. Flask and CherryPy are also excellent choices for web development and come with lots of control over the data flow and customization.

Cyber security is a field that would be incomplete without Python. Every industry within the cyber security domain is related to Python in one way or another and the majority of cyber security tools are written in Python. From penetration testing to monitoring security operations centers, Python is widely used and needed. Python aids penetration testers by providing them with excellent tools and automation support with which they can write quick and powerful scripts for a variety of penetration testing activities, from reconnaissance to exploitation. We will learn about this in great detail throughout the course of this book.

Machine learning (ML) and artificial intelligence (AI) are buzz words in the tech industry that we come across frequently nowadays. Python has excellent support for all ML and AI models. Python, by default in most cases, is the first choice for anyone who wants to learn ML and AI. The other famous language in this domain is R, but because of Python's excellent coverage across all the other technology and software development stacks, it is easier to combine machine learning solutions written in Python with existing or new products than it is to combine solutions written in R. Python has got amazing machine learning libraries and APIs such as scikit-learn, NumPy, Pandas, matplotlib, NLTK, and TensorFlow. Pandas and NumPy have made scientific computations a very easy task, giving users the flexibility to process huge datasets in memory with an excellent layer of abstraction, which allows developers and programmers to forget about the background details and get the job done neatly and efficiently.

A few years ago, a typical database engineer would have been expected to know relational databases such as MySQL, SQL Server, Oracle, PostgreSQL, and so on. Over the past few years, however, the technology landscape has completely changed. While a typical database engineer is still supposed to know and be proficient with this database technology stack, this is no longer enough. With the increasing volume of data, as we enter the era of big data, traditional databases have to work in conjunction with big data solutions such as Hadoop or Spark. Having said that, the role of the database engineer has evolved to be one that includes the skill set of a data analyst. Now, data is not to be fetched and processed from local database servers—it is to be collected from heterogeneous sources, pre-processed, processed across a distributed cluster or parallel cores, and then stored back across the distributed cluster of nodes. What we are talking about here is big data analytics and distributed computing. We mentioned the word Hadoop previously. If you are not familiar with it, Hadoop is an engine that is capable of processing huge files by spawning chunks of files across a cluster of computers and then performing an aggregation on the processed result set, something which is popularly known as a map-reduce operation. Apache Spark is a new buzzword in the domain of analytics and it claims to be 100 times faster than the Hadoop ecosystem. Apache Spark has got a Python API for Python developers called pyspark, using which we can run Apache Spark with native Python code. It is extremely powerful and having familiarity with Python makes the setup easy and seamless.

The objective of mentioning the preceding points was to highlight the significance of Python in the current technological landscape and in the coming future. ML and AI are likely to be the dominating industries, both of which are primarily powered by Python. For this reason, there will not be a better time to start reading about and exploring Python and cyber security with machine learning than now. Let's start our journey into Python by looking at a few basics.

About Python – compiled or interpreted

Compilers work by converting human-readable code written in high-level programming languages into machine code, which is then run by the underlying architecture or machine. If you don't wish to run the code, the compiled version can be saved and executed later on. It should be noted that the compiler first checks for syntax errors and only creates the compiled version of the program if none are found. If you have used C, you might have come across .out files, which are examples of compiled files.

In the case of interpreters, however, each line of the program is taken and interpreted from the source code at runtime and then converted into machine code for execution. Python falls into the category of interpreted byte code. This means that the Python code is first translated to an intermediate byte code (a .pyc file). Then, this byte code is interpreted line by line by the interpreter and executed on the underlying architecture.

Installing Python

Over the course of this book, all of the exercises will be shown on a Linux OS. In my case, I am using Ubuntu 16.04. You can choose any variant you prefer. We will be using python3 for our exercises, which can be installed as follows:

sudo apt-get install python3
sudo apt-get install python3-pip

The second command installs pip, which is Python's package manager. All open source Python libraries that do not come as part of the standard installation can be installed with the help of pip. We will be exploring how to use pip in the upcoming sections.