Book Image

Python for Secret Agents - Volume II - Second Edition

By : Steven F. Lott, Steven F. Lott
Book Image

Python for Secret Agents - Volume II - Second Edition

By: Steven F. Lott, Steven F. Lott

Overview of this book

Python is easy to learn and extensible programming language that allows any manner of secret agent to work with a variety of data. Agents from beginners to seasoned veterans will benefit from Python's simplicity and sophistication. The standard library provides numerous packages that move beyond simple beginner missions. The Python ecosystem of related packages and libraries supports deep information processing. This book will guide you through the process of upgrading your Python-based toolset for intelligence gathering, analysis, and communication. You'll explore the ways Python is used to analyze web logs to discover the trails of activities that can be found in web and database servers. We'll also look at how we can use Python to discover details of the social network by looking at the data available from social networking websites. Finally, you'll see how to extract history from PDF files, which opens up new sources of data, and you’ll learn about the ways you can gather data using an Arduino-based sensor device.
Table of Contents (12 chapters)
Python for Secret Agents Volume II
Credits
About the Author
About the Reviewer
www.PacktPub.com
Preface
Index

Trails of activity


We can leverage the referrer (famously misspelled referer) information to track access around a web site. As with other interesting fields, we need to decompose this into host name and path information. The most reliable way to do this is to use the urllib.parse module.

This means that we'll need to make a change to our log_event_2() function to add yet another parsing step. When we parse the referrer URL, we'll get at least six pieces of information:

  • scheme: This is usually http.

  • netloc: This is the server which made the referral. This will be the name of the server, not the IP address.

  • path: This is the path to the page which had the link.

  • params: This can be anything after the ? symbol in a URL. Usually, this is empty for simple static content sites.

  • fragment: This can be anything after the # in a URL.

These details are items within a Namedtuple object: we can refer to them by name or by position within the tuple. We have three ways to handle the parsing of URLs:

  • We can...