A very common need in many kinds of programs is to count the occurrences of a value or of an event, which means counting frequency. Be it the need to count words in text, count likes on a blog post, or track scores for players of a video game, in the end counting frequency means counting how many we have of a specific value.
The most obvious solution for such a need would be to keep around counters for the things we need to count. If there are two, three, or four, maybe we can just track them in some dedicated variables, but if there are hundreds, it's certainly not feasible to keep around such a large amount of variables and we will quickly end up with a solution based on a container to collect all those counters.
Here are the steps for this recipe:
- Suppose we want to track the frequency of words in text; the standard library comes to our rescue and provides us with a very good way to track counts and frequencies, which is through the dedicated
collections.Counter
object. - The
collections.Counter
object not only keeps track of frequencies, but provides some dedicated methods to retrieve the most common entries, entries that appear at last once and quickly count any iterable. - Any iterable you provide to the
Counter
is "counted" for its frequency of values:
>>> txt = "This is a vast world you can't traverse world in a day"
>>>
>>> from collections import Counter
>>> counts = Counter(txt.split())
- The result would be exactly what we expect, a dictionary with the frequencies of the words in our phrase:
Counter({'a': 2, 'world': 2, "can't": 1, 'day': 1, 'traverse': 1,
'is': 1, 'vast': 1, 'in': 1, 'you': 1, 'This': 1})
>>> counts.most_common(2)
[('world', 2), ('a', 2)]
- Get the frequency of a specific word:
>>> counts['world']
2
Or, get back the total number of occurrences:
>>> sum(counts.values())
12
- And we can even apply some set operations on counters, such as joining them, subtracting them, or checking for intersections:
>>> Counter(["hello", "world"]) + Counter(["hello", "you"])
Counter({'hello': 2, 'you': 1, 'world': 1})
>>> Counter(["hello", "world"]) & Counter(["hello", "you"])
Counter({'hello': 1})
Our counting code relies on the fact that Counter
is just a special kind of dictionary, and that dictionaries can be built by providing an iterable. Each entry in the iterable will be added to the dictionary.
In the case of a counter, adding an element means incrementing its count; for every "word" in our list, we add that word multiple times (one every time it appears in the list), so its value in the Counter
continues to get incremented every time the word is encountered.
Relying on Counter
is actually not the only way to track frequencies; we already know that Counter
is a special kind of dictionary, so reproducing the Counter
behavior should be quite straightforward.
Probably every one of us came up with a dictionary in this form:
counts = dict(hello=0, world=0, nice=0, day=0)
Whenever we face a new occurrence of hello
, world
, nice
, or day
, we increment the associated value in the dictionary and call it a day:
for word in 'hello world this is a very nice day'.split(): if word in counts: counts[word] += 1
By relying on dict.get
, we can also easily adapt it to count any word, not just those we could foresee:
for word in 'hello world this is a very nice day'.split(): counts[word] = counts.get(word, 0) + 1
But the standard library actually provides a very flexible tool that we can use to improve this code even further, collections.defaultdict
.
defaultdict
is a plain dictionary that won't throw KeyError
for any missing value, but will call a function we can provide to generate the missing value.
So, something such as defaultdict(int)
will create a dictionary that provides 0
for any key that it doesn't have, which is very convenient for our counting purpose:
from collections import defaultdict counts = defaultdict(int) for word in 'hello world this is a very nice day'.split(): counts[word] += 1
The result will be exactly what we expect:
defaultdict(<class 'int'>, {'day': 1, 'is': 1, 'a': 1, 'very': 1, 'world': 1, 'this': 1, 'nice': 1, 'hello': 1})
As for each word, the first time we face it, we will call int
to get the starting value and then add 1
to it. As int
gives 0
when called without any argument, that achieves what we want.
While this roughly solves our problem, it's far from being a complete solution for counting—we track frequencies, but on everything else, we are on our own. What if we want to know the most frequent entry in our bag of words?
The convenience of Counter
is based on the set of additional features specialized for counting that it provides; it's not just a dictionary with a default numeric value, it's a class specialized in keeping track of frequencies and providing convenient ways to access them.