Book Image

Modern Python Standard Library Cookbook

By : Alessandro Molina
Book Image

Modern Python Standard Library Cookbook

By: Alessandro Molina

Overview of this book

The Python 3 Standard Library is a vast array of modules that you can use for developing various kinds of applications. It contains an exhaustive list of libraries, and this book will help you choose the best one to address specific programming problems in Python. The Modern Python Standard Library Cookbook begins with recipes on containers and data structures and guides you in performing effective text management in Python. You will find Python recipes for command-line operations, networking, filesystems and directories, and concurrent execution. You will learn about Python security essentials in Python and get to grips with various development tools for debugging, benchmarking, inspection, error reporting, and tracing. The book includes recipes to help you create graphical user interfaces for your application. You will learn to work with multimedia components and perform mathematical operations on date and time. The recipes will also show you how to deploy different searching and sorting algorithms on your data. By the end of the book, you will have acquired the skills needed to write clean code in Python and develop applications that meet your needs.
Table of Contents (21 chapters)
Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface
Index

Counting frequencies


A very common need in many kinds of programs is to count the occurrences of a value or of an event, which means counting frequency. Be it the need to count words in text, count likes on a blog post, or track scores for players of a video game, in the end counting frequency means counting how many we have of a specific value.

The most obvious solution for such a need would be to keep around counters for the things we need to count. If there are two, three, or four, maybe we can just track them in some dedicated variables, but if there are hundreds, it's certainly not feasible to keep around such a large amount of variables and we will quickly end up with a solution based on a container to collect all those counters.

How to do it...

Here are the steps for this recipe:

  1. Suppose we want to track the frequency of words in text; the standard library comes to our rescue and provides us with a very good way to track counts and frequencies, which is through the dedicated collections.Counter object.
  2. The collections.Counter object not only keeps track of frequencies, but provides some dedicated methods to retrieve the most common entries, entries that appear at last once and quickly count any iterable.
  3. Any iterable you provide to the Counter is "counted" for its frequency of values:
>>> txt = "This is a vast world you can't traverse world in a day"
>>>
>>> from collections import Counter
>>> counts = Counter(txt.split())
  1. The result would be exactly what we expect, a dictionary with the frequencies of the words in our phrase:
Counter({'a': 2, 'world': 2, "can't": 1, 'day': 1, 'traverse': 1, 
         'is': 1, 'vast': 1, 'in': 1, 'you': 1, 'This': 1})
  1. Then, we can easily query for the most frequent words:
>>> counts.most_common(2)
[('world', 2), ('a', 2)]

 

  1. Get the frequency of a specific word:
>>> counts['world']
2

Or, get back the total number of occurrences:

>>> sum(counts.values())
12
  1. And we can even apply some set operations on counters, such as joining them, subtracting them, or checking for intersections:
>>> Counter(["hello", "world"]) + Counter(["hello", "you"])
Counter({'hello': 2, 'you': 1, 'world': 1})
>>> Counter(["hello", "world"]) & Counter(["hello", "you"])
Counter({'hello': 1})

How it works...

Our counting code relies on the fact that Counter is just a special kind of dictionary, and that dictionaries can be built by providing an iterable. Each entry in the iterable will be added to the dictionary.

In the case of a counter, adding an element means incrementing its count; for every "word" in our list, we add that word multiple times (one every time it appears in the list), so its value in the Counter continues to get incremented every time the word is encountered.

There's more...

Relying on Counter is actually not the only way to track frequencies; we already know that Counter is a special kind of dictionary, so reproducing the Counter behavior should be quite straightforward.

Probably every one of us came up with a dictionary in this form:

counts = dict(hello=0, world=0, nice=0, day=0)

 

Whenever we face a new occurrence of hello, world, nice, or day, we increment the associated value in the dictionary and call it a day:

for word in 'hello world this is a very nice day'.split():
    if word in counts:
        counts[word] += 1

By relying on dict.get, we can also easily adapt it to count any word, not just those we could foresee:

for word in 'hello world this is a very nice day'.split():
    counts[word] = counts.get(word, 0) + 1

But the standard library actually provides a very flexible tool that we can use to improve this code even further, collections.defaultdict.

defaultdict is a plain dictionary that won't throw KeyError for any missing value, but will call a function we can provide to generate the missing value.

So, something such as defaultdict(int) will create a dictionary that provides 0 for any key that it doesn't have, which is very convenient for our counting purpose:

from collections import defaultdict

counts = defaultdict(int)
for word in 'hello world this is a very nice day'.split():
    counts[word] += 1

The result will be exactly what we expect:

defaultdict(<class 'int'>, {'day': 1, 'is': 1, 'a': 1, 'very': 1, 'world': 1, 'this': 1, 'nice': 1, 'hello': 1})

As for each word, the first time we face it, we will call int to get the starting value and then add 1 to it. As int gives 0 when called without any argument, that achieves what we want.

While this roughly solves our problem, it's far from being a complete solution for counting—we track frequencies, but on everything else, we are on our own. What if we want to know the most frequent entry in our bag of words?

The convenience of Counter is based on the set of additional features specialized for counting that it provides; it's not just a dictionary with a default numeric value, it's a class specialized in keeping track of frequencies and providing convenient ways to access them.