Book Image

Hands-On Data Structures and Algorithms with Python - Third Edition

By : Dr. Basant Agarwal
Book Image

Hands-On Data Structures and Algorithms with Python - Third Edition

By: Dr. Basant Agarwal

Overview of this book

Choosing the right data structure is pivotal to optimizing the performance and scalability of applications. This new edition of Hands-On Data Structures and Algorithms with Python will expand your understanding of key structures, including stacks, queues, and lists, and also show you how to apply priority queues and heaps in applications. You’ll learn how to analyze and compare Python algorithms, and understand which algorithms should be used for a problem based on running time and computational complexity. You will also become confident organizing your code in a manageable, consistent, and scalable way, which will boost your productivity as a Python developer. By the end of this Python book, you’ll be able to manipulate the most important data structures and algorithms to more efficiently store, organize, and access data in your applications.
Table of Contents (17 chapters)
14
Other Books You May Enjoy
15
Index

Python’s collections module

The collections module provides different types of containers, which are objects that are used to store different objects and provide a way to access them. Before accessing these, let’s consider briefly the role and relationships between modules, packages, and scripts.

A module is a Python script with the .py extension that contains a collection of functions, classes, and variables. A package is a directory that contains collections of modules; it has an __init__.py file, which lets the interpreter know that it is a package. A module can be called into a Python script, which can in turn make use of the module’s functions and variables in its code. In Python, we can import these to a script using the import statement. Whenever the interpreter encounters the import statement, it imports the code of the specified module.

Table 1.7 provides the data types and operations of the collections module and their descriptions:

Container data type

Description

namedtuple

Creates a tuple with named fields similar to regular tuples.

deque

Doubly-linked lists that provide efficient adding and removing of items from both ends of the list.

defaultdict

A dictionary subclass that returns default values for missing keys.

ChainMap

A dictionary that merges multiple dictionaries.

Counter

A dictionary that returns the counts corresponding to their objects/key.

UserDict UserList UserString

These data types are used to add more functionalities to their base data structure, such as a dictionary, list, and string. And we can create subclasses from them for custom dict/list/string.

Table 1.7: Different container data type of the collections module

Let’s consider these types in more detail.

Named tuples

The namedtuple of collections provides an extension of the built-in tuple data type. namedtuple objects are immutable, similar to standard tuples. Thus, we can’t add new fields or modify existing ones after the namedtuple instance is created. They contain keys that are mapped to a particular value, and we can iterate through named tuples either by index or key. The namedtuple function is mainly useful when several tuples are used in an application and it is important to keep track of each of the tuples in terms of what they represent.

In this situation, namedtuple presents a more readable and self-documenting method. The syntax is as follows:

nt = namedtuple(typename , field_names)

Here is an example:

from collections import namedtuple
Book = namedtuple ('Book', ['name', 'ISBN', 'quantity'])
Book1 = Book('Hands on Data Structures', '9781788995573', '50')
#Accessing data items
print('Using index ISBN:' + Book1[1])
print('Using key ISBN:' + Book1.ISBN)

The output will be as follows.

Using index ISBN:9781788995573
Using key ISBN:9781788995573

Here, in the above code, we firstly imported namedtuple from the collections module. Book is a named tuples, “class,” and then, Book1 is created, which is an instance of Book. We also see that the data elements can be accessed using index and key methods.

Deque

A deque is a double-ended queue (deque) that supports append and pop elements from both sides of the list. Deques are implemented as double-linked lists, which are very efficient for inserting and deleting elements in O(1) time complexity.

Consider an example:

from collections import deque
s = deque()   # Creates an empty deque
print(s)
my_queue = deque([1, 2, 'Name'])
print(my_queue)

The output will be as follows.

deque([])
deque([1, 2, 'Name'])

You can also use some of the following predefined functions:

Function

Description

my_queue.append('age')

Insert 'age' at the right end of the list.

my_queue.appendleft('age')

Insert 'age' at the left end of the list.

my_queue.pop()

Delete the rightmost value.

my_queue.popleft()

Delete the leftmost value.

Table 1.8: Description of different queue functions

In this section, we showed the use of the deque method of the collections module, and how elements can be added and deleted from the queue.

Ordered dictionaries

An ordered dictionary is a dictionary that preserves the order of the keys that are inserted. If the key order is important for any application, then OrderedDict can be used:

od = OrderedDict([items])

An example could look like the following:

from collections import OrderedDict
od = OrderedDict({'my': 2, 'name ': 4, 'is': 2, 'Mohan' :5})
od['hello'] = 4
print(od)

The output will be as follows.

OrderedDict([('my', 2), ('name ', 4), ('is', 2), ('Mohan', 5), ('hello', 4)])

In the above code, we create a dictionary, od, using the OrderedDict module. We can observe that the order of the keys is the same as the order when we created the key.

Default dictionary

The default dictionary (defaultdict) is a subclass of the built-in dictionary class (dict) that has the same methods and operations as that of the dictionary class, with the only difference being that it never raises a KeyError, as a normal dictionary would. defaultdict is a convenient way to initialize dictionaries:

d = defaultdict(def_value)

An example could look like the following:

from collections import defaultdict
dd = defaultdict(int)
words = str.split('data python data data structure data python')
for word in words:
    dd[word] += 1
print(dd)

The output will be as follows.

defaultdict(<class 'int'>, {'data': 4, 'python': 2, 'structure': 1})

In the above example, if an ordinary dictionary had been used, then Python would have shown KeyError while the first key was added. int, which we supplied as an argument to defaultdict, is really the int() function, which simply returns a zero.

ChainMap object

ChainMap is used to create a list of dictionaries. The collections.ChainMap data structure combines several dictionaries into a single mapping. Whenever a key is searched in the chainmap, it looks through all the dictionaries one by one, until the key is not found:

class collections.ChainMap(dict1, dict2)

An example could look like the following:

from collections import ChainMap
dict1 = {"data": 1, "structure": 2}
dict2 = {"python": 3, "language": 4}
chain = ChainMap(dict1, dict2)
print(chain)
print(list(chain.keys()))
print(list(chain.values()))
print(chain["data"])
print(chain["language"])

The output will be:

ChainMap({'data': 1, 'structure': 2}, {'python': 3, 'language': 4})
['python', 'language', 'data', 'structure']
[3, 4, 1, 2]
1
4

In the above code, we create two dictionaries, namely, dict1 and dict2, and then we can combine both of these dictionaries using the ChainMap method.

Counter objects

As we discussed earlier, a hashable object is one whose hash value will remain the same during its lifetime in the program. counter is used to count the number of hashable objects. Here, the dictionary key is a hashable object, while the corresponding value is the count of that object. In other words, counter objects create a hash table in which the elements and their count are stored as dictionary keys and value pairs.

Dictionary and counter objects are similar in the sense that data is stored in a {key, value} pair, but in counter objects, the value is the count of the key whereas it can be anything in the case of dictionary. Thus, when we only want to see how many times each unique word is occurring in a string, we use the counter object.

An example could look like the following:

from collections import Counter
inventory = Counter('hello')
print(inventory)
print(inventory['l'])
print(inventory['e'])
print(inventory['o'])

The output will be:

Counter({'l': 2, 'h': 1, 'e': 1, 'o': 1})
2
1
1

In the above code, the inventory variable is created, which holds the counts of all the characters using the counter module. The count values of these characters can be accessed using dictionary-like key access ([key]).

UserDict

Python supports a container, UserDict, present in the collections module, that wraps the dictionary objects. We can add customized functions to the dictionary. This is very useful for applications where we want to add/update/modify the functionalities of the dictionary. Consider the example code below where pushing/adding a new data element is not allowed in the dictionary:

# we can not push to this user dictionary
from collections import UserDict
class MyDict(UserDict):
    def push(self, key, value): 
        raise RuntimeError("Cannot insert")
d = MyDict({'ab':1, 'bc': 2, 'cd': 3})
d.push('b', 2)

The output is as follows:

RuntimeError: Cannot insert

In the above code, a customized push function in the MyDict class is created to add the customized functionality, which does not allow you to insert an element into the dictionary.

UserList

A UserList is a container that wraps list objects. It can be used to extend the functionality of the list data structure. Consider the example code below, where pushing/adding a new data element is not allowed in the list data structure:

# we can not push to this user list
from collections import UserList
class MyList(UserList):
    def push(self, key):
        raise RuntimeError("Cannot insert in the list")
d = MyList([11, 12, 13])
d.push(2)

The output is as follows:

RuntimeError: Cannot insert in the list

In the above code, a customized push function in the MyList class is created to add the functionality to not allow you to insert an element into the list variable.

UserString

Strings can be considered as an array of characters. In Python, a character is a string of one length and acts as a container that wraps a string object. It can be used to create strings with customized functionalities. An example could look like the following:

#Create a custom append function for string
from collections import UserString
class MyString(UserString):
    def append(self, value):
        self.data += value
s1 = MyString("data")
print("Original:", s1)
s1.append('h')
print("After append: ", s1)

The output is:

Original: data
After append:  datah

In the above example code, a customized append function in the MyString class is created to add the functionality to append a string.