Book Image

Python Penetration Testing Essentials

By : Mohit
Book Image

Python Penetration Testing Essentials

By: Mohit

Overview of this book

Table of Contents (14 chapters)
Python Penetration Testing Essentials
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Information gathering of a website from SmartWhois by the parser BeautifulSoup


Consider a situation where you want to glean all the hyperlinks from the webpage. In this section, we will do this by programming. On the other hand, this can also be done manually by viewing the view source of the web page. However this will take some time.

So let's get acquainted with a very beautiful parser called BeautifulSoup. This parser is from a third-party source and is very easy to work with. In our code, we will use version 4 of BeautifulSoup.

The requirement is the title of the HTML page and hyperlinks.

The code is as follows:

import urllib
from bs4 import BeautifulSoup
url = raw_input("Enter the URL ")
ht= urllib.urlopen(url)
html_page = ht.read()
b_object = BeautifulSoup(html_page)
print b_object.title
print b_object.title.text
for link in b_object.find_all('a'):
  print(link.get('href'))

The from bs4 import BeautifulSoup statement is used to import the BeautifulSoup library. The url variable stores the...