Book Image

Learning Python Networking - Second Edition

By : José Manuel Ortega, Dr. M. O. Faruque Sarker, Sam Washington
Book Image

Learning Python Networking - Second Edition

By: José Manuel Ortega, Dr. M. O. Faruque Sarker, Sam Washington

Overview of this book

Network programming has always been a demanding task. With full-featured and well-documented libraries all the way up the stack, Python makes network programming the enjoyable experience it should be. Starting with a walk through of today's major networking protocols, through this book, you'll learn how to employ Python for network programming, how to request and retrieve web resources, and how to extract data in major formats over the web. You will utilize Python for emailing using different protocols, and you'll interact with remote systems and IP and DNS networking. You will cover the connection of networking devices and configuration using Python 3.7, along with cloud-based network management tasks using Python. As the book progresses, socket programming will be covered, followed by how to design servers, and the pros and cons of multithreaded and event-driven architectures. You'll develop practical clientside applications, including web API clients, email clients, SSH, and FTP. These applications will also be implemented through existing web application frameworks.
Table of Contents (19 chapters)
Free Chapter
1
Section 1: Introduction to Network and HTTP Programming
4
Section 2: Interacting with APIs, Web Scraping, and Server Scripting
9
Section 3: IP Address Manipulation and Network Automation
13
Section 4: Sockets and Server Programming

Python network programming through libraries

In this section, we're going to look at a general approach to network programming in Python. We'll be introducing the main standard library modules and look at some examples to see how they relate to the TCP/IP stack.

An introduction to the PyPI Python repository

The Python Package Index, or PyPI, which can be found at https://pypi.python.org, is the official software repository for third-party applications in the Python programming language. Python developers want it to be a comprehensive catalog of all Python packages written in open source code.

To download packages from the PyPI repository, you can use several tools, but in this section, we will explain how to use the pip command to do so. pip is the official package installer that comes already installed when you install Python on your local machine.

You can find all of the Python networking libraries in the Python PyPI repository, such as requests (https://pypi.org/project/requests) and urllib (https://pypi.org/project/urllib3).

Installing a package using pip is very simple—just execute pip install <package_name>; for example, pip install requests. We can also install pip using the package manager of a Linux distribution. For example, in a Debian or Ubuntu distribution, we can use the apt-get command:

$ sudo apt-get install python-pip

Alternatives to pip for installing packages

We can use alternatives such as conda and Pipenv for the installation of packages in Python. Other components, such as virtualenv, also exist for this reason.

Conda

Conda is another way in which you can install Python packages, though its development and maintenance is provided by another Anaconda company. An advantage of the Anaconda distribution is that it comes with over 100 very popular Python packages, so you can start elbowing in Python straight away. You can download conda from the following link: https://www.anaconda.com/download/.

Installing packages with conda is just as easy as with pip—just run conda install <package_name>; for example, conda install requests.

The conda repository is independent of the official Python repository and does not find all of the Python packages that are in PyPI, but you will find all of the Python networking libraries such as requests (https://anaconda.org/anaconda/requests), urllib, and socket.

Virtualenv

virtualenv is a Python tool for creating virtual environments. To install it, you just have to run pip install virtualenv. With this, you can start creating virtual environments, for example, virtualenv ENV. Here, ENV is a directory that will be installed in a virtual environment that includes a separate Python installation. For more information, see the complete guide, which includes information on how to activate the environments: https://virtualenv.pypa.io.

Pipenv

Pipenv is a relatively new tool that modernizes the way Python manages dependencies, and includes a complete dependency resolver in the same way conda does for handling virtual environments, locking files, and more. Pipenv is an official Python program, so you just have to run pip install pipenv to install it. You can find an excellent guide for Pipenv in English here: https://realpython.com/pipenv-guide.

An introduction to libraries for network programming with Python

Python provides modules for interfacing with protocols at different levels in the network stack, and modules that support higher-layer protocols follow the aforementioned principle by using the interfaces that are supplied by the lower-level protocols.

Introduction to sockets

The socket module is Python's standard interface for the transport layer, and it provides functions for interacting with TCP and UDP, as well as for looking up hostnames through DNS. In this section, we will introduce you to this module. We'll learn much more about this in Chapter 10, Programming with Sockets.

A socket is defined by the IP address of the machine, the port on which it listens, and the protocol it uses. The types and functions that are needed to work with sockets are in Python in the socket module.

Sockets are classified into stream sockets, socket.SOCK_STREAM, or datagram sockets, socket.SOCK_DGRAM, depending on whether the service uses TCP, which is connection oriented and reliable, or UDP, respectively.

The sockets can also be classified according to their family. We have Unix sockets, such as socket.AF_UNIX, that were created before the conception of the networks and are based on socket.AF_INET file, which are based on network connections and sockets related to connections with IPv6, such as socket.AF_INET6.

Socket module in Python

To create a socket, the socket.socket() constructor is used, which can take the family, type, and protocol as optional parameters. By default, the AF_INET family and the SOCK_STREAM type are used.

The general syntax is socket.socket(socket_family, socket_type, protocol=0), where the parameters are as follows:

  • socket_family: This is either AF_UNIX or AF_INET
  • socket_type: This is either SOCK_STREAM or SOCK_DGRAM
  • protocol: This is usually left out, defaulting to 0

Client socket methods

To connect to a remote socket in one direction, we can use the connect() method by using the connect (host, port) format:

import socket

# a socket object is created for communication
client_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
# now connect to the web server on port 80
client_socket.connect(("www.packtpub.com", 80))

Server socket methods

The following are some server socket methods, which are also shown in the following code:

  • bind(): With this method, we can define in which port our server will be listening to connections
  • listen(backlog): This method makes the socket accept connections and accept to start listening to connections
  • accept(): This method is used for accepting the following connection:
import socket

serversocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
#bind the socket to localhost on port 80
serversocket.bind((‘localhost', 80))
#become a server socket and listen a maximum of 10 connections
serversocket.listen(10)

Working with RFC

The Request for Comments, better known by its acronym, RFC, are a series of publications of the internet engineering working group that describe various aspects of the operation of the internet and other computer networks, such as protocols and procedures.

Each RFC defines a monograph or memorandum that engineers or experts in the field have sent to the Internet Engineering Task Force (IETF) organization, the most important technical collaboration consortium on the internet, so that it can be valued by the rest of the community.

RFCs cover a wide range of standards, and TCP/IP is just one of these. They are freely available on the IETF's website, which can be found at www.ietf.org/rfc.html. Each RFC has a number; IPv4 is documented by RFC 791, and other relevant RFCs will be mentioned as we progress throughout this book.

The most important IPs are defined by RFC, such as the IP protocol that's detailed in RFC 791, FTP in RFC 959, or HTTP in RFC 2616.

You can use this service to search by RFC number or keyword. This can be found here: https://www.rfc-editor.org/search/rfc_search.php.

In the following screenshot, we can see the result of searching for RFC number 2616 for the HTTP protocol:

Extracting RFC information

The IETF landing page for RFCs is http://www.rfc-editor.org/rfc/, and reading through it tells us exactly what we want to know. We can access a text version of an RFC using a URL of the form http://www.rfc-editor.org/rfc/rfc741.txt. The RFC number in this case is 741. Therefore, we can get the text format of RFCs using HTTP.

At this point, we can build a Python script for downloading an RCF document from IETF, and then display the information that's returned by the service. We'll make it a Python script that just accepts an RFC number, downloads the RFC in text format, and then prints it to stdout.

The main modules that we can find in Python to make HTTP requests are urllib and requests, which work at a high level. We can also use the socket module if we want to work at a low level.

Downloading an RFC with urllib

Now, we are going to write our Python script using the urllib module. For this, create a text file called RFC_download_urllib.py:

#!/usr/bin/env python3

import sys, urllib.request
try:
rfc_number = int(sys.argv[1])
except (IndexError, ValueError):
print('Must supply an RFC number as first argument')
sys.exit(2)
template = 'http://www.rfc-editor.org/rfc/rfc{}.txt'
url = template.format(rfc_number)
rfc_raw = urllib.request.urlopen(url).read()
rfc = rfc_raw.decode()
print(rfc)

We can run the preceding code by using the following command:

$ python RFC_download_urllib.py 2324

This is the output of the previous script, where we can see the RFC description document:

First, we import our modules and check whether an RFC number has been supplied on the command line. Then, we construct our URL by substituting the supplied RFC number. Next, the main activity, the urlopen() call, will construct an HTTP request for our URL, and then it will connect to the IETF web server and download the RFC text. Next, we decode the text to Unicode, and finally we print it out to the screen.

Downloading an RFC with requests

Now, are going to create the same script but, instead of using urllib, we are going to use the requests module. For this, create a text file called RFC_download_requests.py:

#!/usr/bin/env python3

import sys, requests
try:
rfc_number = int(sys.argv[1])
except (IndexError, ValueError):
print('Must supply an RFC number as first argument')
sys.exit(2)
template = 'http://www.rfc-editor.org/rfc/rfc{}.txt'
url = template.format(rfc_number)
rfc = requests.get(url).text
print(rfc)

We can simplify the previous script using the requests module. The main difference with the requests module is that we use the get method for the request and access the text property to get information about the specific RFC.

Downloading an RFC with the socket module

Now, we are going to create the same script but, instead of using urllib or requests, we are going to use the socket module for working at a low level. For this, create a text file called RFC_download_socket.py:

#!/usr/bin/env python3

import sys, socket
try:
rfc_number = int(sys.argv[1])
except (IndexError, ValueError):
print('Must supply an RFC number as first argument')
sys.exit(2)

host = 'www.rfc-editor.org'
port = 80
sock = socket.create_connection((host, port))

req = ('GET /rfc/rfc{rfcnum}.txt HTTP/1.1\r\n'
'Host: {host}:{port}\r\n'
'User-Agent: Python {version}\r\n'
'Connection: close\r\n'
'\r\n'
)

req = req.format(rfcnum=rfc_number,host=host,port=port,version=sys.version_info[0])
sock.sendall(req.encode('ascii'))
rfc_bytes = bytearray()

while True:
buf = sock.recv(4096)
if not len(buf):
break
rfc_bytes += buf
rfc = rfc_bytes.decode('utf-8')
print(rfc)

The main difference here is that we are using a socket module instead of urllib or requests. Socket is Python's interface for the operating system's TCP and UDP implementation. We have to tell socket which transport layer protocol we want to use. We do this by using the socket.create_connection() convenience function. This function will always create a TCP connection. For establishing the connection, we are using port 80, which is the standard port number for web services over HTTP.

Next, we deal with the network communication over the TCP connection. We send the entire request string to the server by using the sendall() call. The data that's sent through TCP must be in raw bytes, so we have to encode the request text as ASCII before sending it.

Then, we piece together the server's response as it arrives in the while loop. Bytes that are sent to us through a TCP socket are presented to our application in a continuous stream. So, like any stream of unknown length, we have to read it iteratively. The recv() call will return the empty string after the server sends all of its data and closes the connection. Finally, we can use this as a condition for breaking out and printing the response.