Book Image

Learning Python Networking - Second Edition

By : José Manuel Ortega, Dr. M. O. Faruque Sarker, Sam Washington
Book Image

Learning Python Networking - Second Edition

By: José Manuel Ortega, Dr. M. O. Faruque Sarker, Sam Washington

Overview of this book

Network programming has always been a demanding task. With full-featured and well-documented libraries all the way up the stack, Python makes network programming the enjoyable experience it should be. Starting with a walk through of today's major networking protocols, through this book, you'll learn how to employ Python for network programming, how to request and retrieve web resources, and how to extract data in major formats over the web. You will utilize Python for emailing using different protocols, and you'll interact with remote systems and IP and DNS networking. You will cover the connection of networking devices and configuration using Python 3.7, along with cloud-based network management tasks using Python. As the book progresses, socket programming will be covered, followed by how to design servers, and the pros and cons of multithreaded and event-driven architectures. You'll develop practical clientside applications, including web API clients, email clients, SSH, and FTP. These applications will also be implemented through existing web application frameworks.
Table of Contents (19 chapters)
Free Chapter
1
Section 1: Introduction to Network and HTTP Programming
4
Section 2: Interacting with APIs, Web Scraping, and Server Scripting
9
Section 3: IP Address Manipulation and Network Automation
13
Section 4: Sockets and Server Programming

Interacting with Wireshark with pyshark

This section will help you update the basics of Wireshark to capture packets, filter them, and inspect them. You can use Wireshark to analyze the network traffic of a suspicious program, analyze the traffic flow in your network, or solve network problems. We will also review the pyshark module for capturing packets in Python.

Introduction to Wireshark

Wireshark is a network packet analysis tool that captures packets in real time and displays them in a graphic interface. Wireshark includes filters, color coding, and other features that allow you to analyze network traffic and inspect packets individually.

Wireshark implements a wide range of filters that facilitate the definition of search criteria for the more than 1,000 protocols it currently supports. All of this happens through a simple and intuitive interface that allows each of the captured packages to be broken down into layers.

Thanks to Wireshark understanding the structure of these protocols, we can visualize the fields of each of the headers and layers that make up the packages, providing a wide range of possibilities to the network administrator when it comes to performing tasks in the analysis of traffic.

One of the advantages that Wireshark has is that at any given moment, we can leave capturing data in a network for as long as we want and then store them so that we can perform the analysis later. It works on several platforms, such as Windows, OS X, Linux, and Unix.

Wireshark is also considered a protocol analyzer or packet sniffer, thus allowing us to observe the messages that are exchanged between applications. For example, if we capture an HTTP message, the packet analyzer must know that this message is encapsulated in a TCP segment, which, in turn, is encapsulated in an IP packet, and which, in turn, is encapsulated in an Ethernet frame.

A protocol analyzer is a passive element, since it only observes messages that are transmitted and received from to an element of the network, but never sends messages themselves. Instead, a protocol analyzer receives a copy of the messages that are being received or sent to the Terminal where it is running.

Wireshark is composed mainly of two elements: a packet capture library, which receives a copy of each data link frame that is either sent or received, and a packet analyzer, which shows the fields corresponding to each of the captured packets. To do this, the packet analyzer must know about the protocols that it is analyzing so that the information that's shown is consistent.

Wireshark installation

You can download the Wireshark tool from the official page: http://www.wireshark.org/download.html.

On Windows systems, we can install the following wizard in the Windows installer. On a Linux distribution based on the Debian operating system, such as Ubuntu, this is as easy as executing the apt-get command:

sudo apt-get install wireshark

One of the advantages of Wireshark is the filtering we can make regarding the captured data. We can filter protocols, source, or destination IP, for a range of IP addresses, ports, or uni-cast traffic, among a long list of options. We can manually enter the filters in a box or select these filters from a default list.

Capturing packets with Wireshark

To start capturing packets, you can click on the name of an interface from the list of interfaces. For example, if you want to capture traffic on your Ethernet network, double-click on the Ethernet connection interface:

As soon as you click on the name of the interface, you will see that the packages start to appear in real time. Wireshark captures every packet that's sent to or from your network traffic. You will see random flooding of data in the Wireshark dashboard. There are many ways to filter traffic:

  • To filter traffic from any specific IP address, type ip.addr == 'xxx.xx.xx.xx' in the Apply a display filter field
  • To filter traffic for a specific protocol, say, TCP, UDP, SMTP, ARP, and DNS requests, just type the protocol name into the Apply a display filter field

We can use the Apply a display filter box to filter traffic from any IP address or protocol:

The graphical interface of Wireshark is mainly divided into the following sections:

  • The toolbar, where you have all the options that you can perform on the pre and post capture
  • The main toolbar, where you have the most frequently used options in Wireshark
  • The filter bar, where you can apply filters to the current capture quickly
  • The list of packages, which shows a summary of each package that is captured by Wireshark
  • The panel of details of packages that, once you have selected a package in the list of packages, shows detailed information of the same
  • The packet byte panel, which shows the bytes of the selected packet, and highlights the bytes corresponding to the field that's selected in the packet details panel
  • The status bar, which shows some information about the current state of Wireshark and the capture

Network traffic in Wireshark

Network traffic or network data is the amount of packets that are moving across a network at any given point of time. The following is a classical formula for obtaining the traffic volume of a network: Traffic volume = Traffic Intensity or rate * Time

In the following screenshot, we can see what the network traffic looks like in Wireshark:

In the previous screenshot, we can see all the information that is sent over, along with the data packets on a network. It includes several pieces of information, including the following:

  • Time: The time at which packets are captured
  • Source: The source from which the packet originated
  • Destination: The sink where packets reach their final destination
  • Protocol: Type of IP (or set of rules) the packet followed during its journey, such as TCP, UDP, SMTP, and ARP
  • Info: The information that the packet contains

The Wireshark website contains samples for capture files that you can import into Wireshark. You can also inspect the packets that they contain: https://wiki.wireshark.org/SampleCaptures.

For example, we can find an HTTP section for downloading files that contains examples of HTTP requests and responses:

Color coding in Wireshark

When you start capturing packets, Wireshark uses colors to identify the types of traffic that can occur, among which we can highlight green for TCP traffic, blue for DNS traffic, and black for traffic that has errors at the packet level.

To see exactly what the color codes mean, click View | Coloring rules. You can also customize and modify the coloring rules in this screen.

If you need to change the color of one of the options, just double-click it and choose the color you want:

Working with filters in Wireshark

When we have a very high data collection, the filters allow us to show only those packages that fit our search criteria. We can distinguish between capture filters and display filters depending on the syntax with which each of them is governed.

The capture filters are supported directly on libpcap libraries such as tcpdump or Snort, so they depend directly on them to define the filters. For this reason, we can use Wireshark to open files that are generated by tcpdump or by those applications that make use of them.

The most basic way to apply a filter is by typing its name into the filter box at the top of the window. For example, type dns and you will see only DNS packets.

The following is a screenshot of the dns filter:

You can also click on the Analyze menu and select Display Filters to see the filters that are created by default.

In the following screenshot, we can see the display filters that we can apply when capturing packets with Wireshark:

Filtering by protocol name

This filter is very powerful, but you will realize its full potential now that you are going to filter by protocol. Some of the filters include TCP, HTTP, POP, DNS, ARP, and SSL.

We can find out about HTTP requests by applying the HTTP filter. In this way, we can know about all of the GET and POST requests that have been made during the capture. Wireshark displays the HTTP message that was encapsulated in a TCP segment, which was encapsulated in an IP packet and encapsulated in an Ethernet frame:

In the preceding screenshot, we can see how a GET request has been sent to the URL that was requested from the browser. After this, the web server where the page is hosted has answered successfully (200 OK), encapsulating itself in an HTTP message where the html code contains the required path. It is the browser (application) that de-encapsulates the code and interprets it.

HTTP objects filter

As we can see, the filters provide us with a great traceability of communications and also serves as an ideal complement to analyze a multitude of attacks. An example of this is the http.content_type filter, thanks to which we can extract different data flows that take place in an HTTP connection (text/html, application/zip, audio/mpeg, image/gif). This will be very useful for locating malware, exploits, or other types of attacks that are embedded in such a protocol:

Wireshark contemplates two types of filters, that is, capture filters and display filters:

  • Capture filters are those that are set to show only packets that meet the requirements indicated in the filter
  • Display filters establish a filter criterion on the captured packages, which we are visualizing in the main screen of Wireshark

Capture filters

Capture filters are those that are set to show only the packages that meet the requirements indicated in the filter. If we do not establish any, Wireshark will capture all of the traffic and present it on the main screen. Even so, we can set the display filters to show us only the desired traffic:

Display filters

The visualization filters establish a criterion of filter on the packages that we are capturing and that we are visualizing in the main screen of Wireshark. When you apply a filter on the Wireshark main screen, only the filtered traffic will appear through the display filter. We can also use it to filter the content of a capture through a pcap file:

Analyzing networking traffic using the pyshark library

We can use the pyshark library to analyze the network traffic in Python, since everything Wireshark decodes in each packet is made available as a variable. We can find the source code of the tool in GitHub's repository: https://github.com/KimiNewt/pyshark.

In the PyPI repository, we can find the last version of the library, that is, https://pypi.org/project/pyshark, and we can install it with the pip install pyshark command.

In the documentation of the module, we can see that the main package for opening and analyzing a pcap file is capture.file_capture:

Here's an example that was taken from pyshark's GitHub page. This shows us how, from the Python 3 command-line interpreter, we can read packets stored in a pcap file. This will give us access to attributes such as packet number and complete information for each layer, such as its protocol, IP address, mac address, and flags, where you can see if the packet is a fragment of another:

>> import pyshark
>>> cap = pyshark.FileCapture(‘http.cap')
>>> cap
>>> print(cap[0])

In the following screenshot, we can see the execution of the previous commands, and also see where we passed the pcap file path in the FileCapture method as a parameter:

We can apply a filter for DNS traffic only with the display_filter argument in the FileCapture method:

import pyshark
cap = pyshark.FileCapture('http.cap', display_filter="dns")
for pkt in cap:
print(pkt.highest_layer)

In the following screenshot, we can see the execution of the previous commands:

FileCapture and LiveCapture in pyshark

As we saw previously, you can use the FileCapture method to open a previously saved trace file. You can also use pyshark to sniff from an interface in real time with the LiveCapture method, like so:

import pyshark
# Sniff from interface in real time
capture = pyshark.LiveCapture(interface='eth0')
capture.sniff(timeout=10)
<LiveCapture (5 packets)>

Once a capture object is created, either from a LiveCapture or FileCapture method, several methods and attributes are available at both the capture and packet level. The power of pyshark is that it has access to all of the packet decoders that are built into TShark.

Now, let's see what methods provide the returned capture object.

To check this, we can use the dir method with the capture object:

The display_filter, encryption, and input_filename attributes are used for displaying parameters that are passed into FileCapture or LiveCapture.

Both methods offer similar parameters that affect packets that are returned in the capture object. For example, we can iterate through the packets and apply a function to each. The most useful method here is the apply_on_packets() method. apply_on_packets() is the main way to iterate through the packets, passing in a function to apply to each packet:

>>> cap = pyshark.FileCapture('http.cap', keep_packets=False)
>>> def print_info_layer(packet):
>>> print("[Protocol:] "+packet.highest_layer+" [Source IP:] "+packet.ip.src+" [Destination IP:]"+packet.ip.dst)
>>> cap.apply_on_packets(print_info_layer)

In the following screenshot, we can see the information that's returned when we are obtaining information for each packet pertaining to Protocol, Source IP, and Destination IP:

We can also use the apply_on_packets() method for adding the packets to a list for counting or other processing means. Here's a script that will append all of the packets to a list and print the count. For this, create a text file called count_packets.py:

import pyshark
packets_array = []

def counter(*args):
packets_array.append(args[0])

def count_packets():
cap = pyshark.FileCapture('http.cap', keep_packets=False)
cap.apply_on_packets(counter, timeout=10000)
return len(packets_array)

print("Packets number:"+str(count_packets()))

for packet in packets_array:
print(packet)

We can use only_summaries, which will return packets in the capture object with just the summary information of each packet:

>>> cap = pyshark.FileCapture(‘http.cap', only_summaries=True)
>>> print cap[0]

This option makes capture file reading much faster, and with the dir method, we can check the attributes that are available in the object to obtain information about a specific packet.

In the following screenshot, we can see information about a specific packet and get all of the attributes that return not null information:

The information you can see in the form of attributes is as follows:

  • destination: The IP destination address
  • source: The IP source address
  • info: A summary of the application layer
  • length: Length of the packet in bytes
  • no: Index number of the packet
  • protocol: The highest layer protocol that's recognized in the packet
  • summary_line: All of the summary attributes in one string
  • time: Time between the current packet and the first packet