Book Image

Effective Python Penetration Testing

By : Rejah Rehim
Book Image

Effective Python Penetration Testing

By: Rejah Rehim

Overview of this book

Penetration testing is a practice of testing a computer system, network, or web application to find weaknesses in security that an attacker can exploit. Effective Python Penetration Testing will help you utilize your Python scripting skills to safeguard your networks from cyberattacks. We will begin by providing you with an overview of Python scripting and penetration testing. You will learn to analyze network traffic by writing Scapy scripts and will see how to fingerprint web applications with Python libraries such as ProxMon and Spynner. Moving on, you will find out how to write basic attack scripts, and will develop debugging and reverse engineering skills with Python libraries. Toward the end of the book, you will discover how to utilize cryptography toolkits in Python and how to automate Python tools and libraries.
Table of Contents (16 chapters)
Effective Python Penetration Testing
Credits
About the Author
About the Reviewer
www.PacktPub.com
Preface

Parsing HTML with lxml


Another powerful, fast, and flexible parser is the HTML Parser that comes with lxml. As lxml is an extensive library written for parsing both XML and HTML documents, it can handle messed up tags in the process.

Let's start with an example.

Here, we will use the requests module to retrieve the web page and parse it with lxml:

#Importing modules 
from lxml import html 
import requests 
 
response = requests.get('http://packtpub.com/') 
tree = html.fromstring(response.content) 

Now the whole HTML is saved to tree in a nice tree structure that we can inspect in two different ways: XPath or CSS Select. XPath is used to navigate through elements and attributes to find information in structured documents such as HTML or XML.

We can use any of the page inspect tools, such as Firebug or Chrome developer tools, to get the XPath of an element:

If we want to get the book names and prices from the  list, find the following section in the source.

<div...