Python’s collections module
The collections
module provides different types of containers, which are objects that are used to store different objects and provide a way to access them. Before accessing these, let’s consider briefly the role and relationships between modules, packages, and scripts.
A module is a Python script with the .py
extension that contains a collection of functions, classes, and variables. A package is a directory that contains collections of modules; it has an __init__.py
file, which lets the interpreter know that it is a package. A module can be called into a Python script, which can in turn make use of the module’s functions and variables in its code. In Python, we can import these to a script using the import
statement. Whenever the interpreter encounters the import
statement, it imports the code of the specified module.
Table 1.7 provides the data types and operations of the collections module and their descriptions:
Container data type |
Description |
|
Creates a |
|
Doubly-linked lists that provide efficient adding and removing of items from both ends of the list. |
|
A |
|
A |
|
A |
|
These data types are used to add more functionalities to their base data structure, such as a |
Table 1.7: Different container data type of the collections module
Let’s consider these types in more detail.
Named tuples
The namedtuple
of collections
provides an extension of the built-in tuple data type. namedtuple
objects are immutable, similar to standard tuples. Thus, we can’t add new fields or modify existing ones after the namedtuple
instance is created. They contain keys that are mapped to a particular value, and we can iterate through named tuples either by index or key. The namedtuple
function is mainly useful when several tuples are used in an application and it is important to keep track of each of the tuples in terms of what they represent.
In this situation, namedtuple presents a more readable and self-documenting method. The syntax is as follows:
nt = namedtuple(typename , field_names)
Here is an example:
from collections import namedtuple
Book = namedtuple ('Book', ['name', 'ISBN', 'quantity'])
Book1 = Book('Hands on Data Structures', '9781788995573', '50')
#Accessing data items
print('Using index ISBN:' + Book1[1])
print('Using key ISBN:' + Book1.ISBN)
The output will be as follows.
Using index ISBN:9781788995573
Using key ISBN:9781788995573
Here, in the above code, we firstly imported namedtuple
from the collections
module. Book
is a named tuples, “class
,” and then, Book1
is created, which is an instance of Book
. We also see that the data elements can be accessed using index and key methods.
Deque
A deque
is a double-ended queue (deque) that supports append and pop elements from both sides of the list. Deques are implemented as double-linked lists, which are very efficient for inserting and deleting elements in O(1) time complexity.
Consider an example:
from collections import deque
s = deque() # Creates an empty deque
print(s)
my_queue = deque([1, 2, 'Name'])
print(my_queue)
The output will be as follows.
deque([])
deque([1, 2, 'Name'])
You can also use some of the following predefined functions:
Function |
Description |
|
Insert |
|
Insert |
|
Delete the rightmost value. |
|
Delete the leftmost value. |
Table 1.8: Description of different queue functions
In this section, we showed the use of the deque
method of the collections
module, and how elements can be added and deleted from the queue
.
Ordered dictionaries
An ordered dictionary is a dictionary that preserves the order of the keys that are inserted. If the key order is important for any application, then OrderedDict
can be used:
od = OrderedDict([items])
An example could look like the following:
from collections import OrderedDict
od = OrderedDict({'my': 2, 'name ': 4, 'is': 2, 'Mohan' :5})
od['hello'] = 4
print(od)
The output will be as follows.
OrderedDict([('my', 2), ('name ', 4), ('is', 2), ('Mohan', 5), ('hello', 4)])
In the above code, we create a dictionary, od
, using the OrderedDict
module. We can observe that the order of the keys is the same as the order when we created the key.
Default dictionary
The default dictionary (defaultdict
) is a subclass of the built-in dictionary class (dict)
that has the same methods and operations as that of the dictionary
class, with the only difference being that it never raises a KeyError
, as a normal dictionary would. defaultdict
is a convenient way to initialize dictionaries:
d = defaultdict(def_value)
An example could look like the following:
from collections import defaultdict
dd = defaultdict(int)
words = str.split('data python data data structure data python')
for word in words:
dd[word] += 1
print(dd)
The output will be as follows.
defaultdict(<class 'int'>, {'data': 4, 'python': 2, 'structure': 1})
In the above example, if an ordinary dictionary had been used, then Python would have shown KeyError
while the first key was added. int
, which we supplied as an argument to defaultdict
, is really the int()
function, which simply returns a zero.
ChainMap object
ChainMap
is used to create a list of dictionaries. The collections.ChainMap
data structure combines several dictionaries into a single mapping. Whenever a key is searched in the chainmap
, it looks through all the dictionaries one by one, until the key is not found:
class collections.ChainMap(dict1, dict2)
An example could look like the following:
from collections import ChainMap
dict1 = {"data": 1, "structure": 2}
dict2 = {"python": 3, "language": 4}
chain = ChainMap(dict1, dict2)
print(chain)
print(list(chain.keys()))
print(list(chain.values()))
print(chain["data"])
print(chain["language"])
The output will be:
ChainMap({'data': 1, 'structure': 2}, {'python': 3, 'language': 4})
['python', 'language', 'data', 'structure']
[3, 4, 1, 2]
1
4
In the above code, we create two dictionaries, namely, dict1
and dict2
, and then we can combine both of these dictionaries using the ChainMap
method.
Counter objects
As we discussed earlier, a hashable object is one whose hash value will remain the same during its lifetime in the program. counter
is used to count the number of hashable objects. Here, the dictionary key is a hashable object, while the corresponding value is the count of that object. In other words, counter
objects create a hash table in which the elements and their count are stored as dictionary keys and value pairs.
Dictionary
and counter
objects are similar in the sense that data is stored in a {key, value}
pair, but in counter
objects, the value is the count of the key whereas it can be anything in the case of dictionary
. Thus, when we only want to see how many times each unique word is occurring in a string, we use the counter
object.
An example could look like the following:
from collections import Counter
inventory = Counter('hello')
print(inventory)
print(inventory['l'])
print(inventory['e'])
print(inventory['o'])
The output will be:
Counter({'l': 2, 'h': 1, 'e': 1, 'o': 1})
2
1
1
In the above code, the inventory
variable is created, which holds the counts of all the characters using the counter
module. The count values of these characters can be accessed using dictionary-like key access ([key]
).
UserDict
Python supports a container, UserDict
, present in the collections module, that wraps the dictionary objects. We can add customized functions to the dictionary. This is very useful for applications where we want to add/update/modify the functionalities of the dictionary. Consider the example code below where pushing/adding a new data element is not allowed in the dictionary:
# we can not push to this user dictionary
from collections import UserDict
class MyDict(UserDict):
def push(self, key, value):
raise RuntimeError("Cannot insert")
d = MyDict({'ab':1, 'bc': 2, 'cd': 3})
d.push('b', 2)
The output is as follows:
RuntimeError: Cannot insert
In the above code, a customized push function in the MyDict
class is created to add the customized functionality, which does not allow you to insert an element into the dictionary.
UserList
A UserList
is a container that wraps list objects. It can be used to extend the functionality of the list
data structure. Consider the example code below, where pushing/adding a new data element is not allowed in the list
data structure:
# we can not push to this user list
from collections import UserList
class MyList(UserList):
def push(self, key):
raise RuntimeError("Cannot insert in the list")
d = MyList([11, 12, 13])
d.push(2)
The output is as follows:
RuntimeError: Cannot insert in the list
In the above code, a customized push
function in the MyList
class is created to add the functionality to not allow you to insert an element into the list
variable.
UserString
Strings can be considered as an array of characters. In Python, a character is a string of one length and acts as a container that wraps a string object. It can be used to create strings with customized functionalities. An example could look like the following:
#Create a custom append function for string
from collections import UserString
class MyString(UserString):
def append(self, value):
self.data += value
s1 = MyString("data")
print("Original:", s1)
s1.append('h')
print("After append: ", s1)
The output is:
Original: data
After append: datah
In the above example code, a customized append function in the MyString
class is created to add the functionality to append a string.