If you have ever need to provide a reverse mapping, you have probably discovered that Python lacks a way to store more than a value for each key in a dictionary. This is a very common need, and most languages provide some form of multimap container.
Python tends to prefer having a single way of doing things, and as storing multiple values for the key means just storing a list of values for a key, it doesn't provide a specialized container.
The issue with storing a list of values is that to be able to append to values to our dictionary, the list must already exist.
Proceed with the following steps for this recipe:
- As we already know,
defaultdict
will create a default value by calling the provided callable for every missing key. We can provide thelist
constructor as a callable:
>>> from collections import defaultdict
>>> rd = defaultdict(list)
- So, we insert keys into our multimap by using
rd[k].append(v)
instead of the usualrd[k] = v
:
>>> for name, num in [('ichi', 1), ('one', 1), ('uno', 1), ('un', 1)]:
... rd[num].append(name)
...
>>> rd
defaultdict(<class 'list'>, {1: ['ichi', 'one', 'uno', 'un']})
MultiDict
works by storing a list for each key. Whenever a key is accessed, the list containing all the values for that key is retrieved.
In the case of missing keys, an empty list will be provided so that values can be added for that key.
This works because every time defaultdict
faces a missing key, it will insert it with a value generated by calling list
. And calling list
will actually provide an empty list. So, doing rd[v]
will always provide a list, empty or not, depending on whether v
was an already existing key or not. Once we have our list, adding a new value is just a matter of appending it.
Dictionaries in Python are associative containers where keys are unique. A key can appear a single time and has exactly one value.
If we want to support multiple values per key, we can actually solve the need by saving list
as the value of our key. This list can then contain all the values we want to keep around for that key:
>>> rd = {1: ['one', 'uno', 'un', 'ichi'],
... 2: ['two', 'due', 'deux', 'ni'],
... 3: ['three', 'tre', 'trois', 'san']}
>>> rd[2]
['two', 'due', 'deux', 'ni']
If we want to add a new translation to 2
(Spanish, for example), we would just have to append the entry:
>>> rd[2].append('dos')
>>> rd[2]
['two', 'due', 'deux', 'ni', 'dos']
The problem arises when we want to introduce a new key:
>>> rd[4].append('four')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 4
For key 4
, no list exists, so there is nowhere we can append it. So, our snippet to automatically reverse the mapping can't be easily adapted to handle multiple values, as it would fail with key errors the first time it tries to insert a value:
>>> rd = {}
>>> for k,v in d.items():
... rd[v].append(k)
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
KeyError: 1
Checking for every single entry, whether it's already in the dictionary or not, and acting accordingly is not very convenient. While we can rely on the setdefault
method of dictionaries to hide that check, we can get a far more elegant solution by using collections.defaultdict
.