Book Image

Python Penetration Testing Essentials

By : Mohit Raj
Book Image

Python Penetration Testing Essentials

By: Mohit Raj

Overview of this book

Table of Contents (14 chapters)
Python Penetration Testing Essentials
About the Author
About the Reviewers

Information gathering of a website from SmartWhois by the parser BeautifulSoup

Consider a situation where you want to glean all the hyperlinks from the webpage. In this section, we will do this by programming. On the other hand, this can also be done manually by viewing the view source of the web page. However this will take some time.

So let's get acquainted with a very beautiful parser called BeautifulSoup. This parser is from a third-party source and is very easy to work with. In our code, we will use version 4 of BeautifulSoup.

The requirement is the title of the HTML page and hyperlinks.

The code is as follows:

import urllib
from bs4 import BeautifulSoup
url = raw_input("Enter the URL ")
ht= urllib.urlopen(url)
html_page =
b_object = BeautifulSoup(html_page)
print b_object.title
print b_object.title.text
for link in b_object.find_all('a'):

The from bs4 import BeautifulSoup statement is used to import the BeautifulSoup library. The url variable stores the...