Summary
In this chapter, we discussed a large number of elements of data analysis. We've looked at how we have to disentangle physical format from logical layout and conceptual content. We covered the gzip
module as an example of how we can handle one particularly complex physical format issue.
We focused a lot of attention on using the re
module to write regular expressions that help us parse complex text files. This addresses a number of logical layout considerations. Once we've parsed the text, we can then do data conversions to create proper Python objects so that we have useful conceptual content.
We also saw how we can use a collections.Counter
object to summarize data. This helps us find the most common items, or create complete histograms and frequency tables.
The subprocess
module helped us run the whois program to gather data from around the internet. The general approach to using subprocess allows us to leverage a number of common utilities for getting information about the internet...