C++ Data Structures and Algorithm Design Principles

C++ Data Structures and Algorithm Design Principles

By : John Carey, Anil Achary, Shreyans Doshi, Payas Rajan

Buy this Book

C++ Data Structures and Algorithm Design Principles

By: John Carey, Anil Achary, Shreyans Doshi, Payas Rajan

Buy this Book

Overview of this book

C++ is a mature multi-paradigm programming language that enables you to write high-level code with a high degree of control over the hardware. Today, significant parts of software infrastructure, including databases, browsers, multimedia frameworks, and GUI toolkits, are written in C++. This book starts by introducing C++ data structures and how to store data using linked lists, arrays, stacks, and queues. In later chapters, the book explains the basic algorithm design paradigms, such as the greedy approach and the divide-and-conquer approach, which are used to solve a large variety of computational problems. Finally, you will learn the advanced technique of dynamic programming to develop optimized implementations of several algorithms discussed in the book. By the end of this book, you will have learned how to implement standard data structures and algorithms in efficient and scalable C++ 14 code.

About the Book

Hardware Requirements

Software Requirements

Installation and Setup

Installing the Code Bundle

Additional Resources

Free Chapter

1. Lists, Stacks, and Queues

Introduction

Contiguous Versus Linked Data Structures

std::deque – Special Version of std::vector

Container Adaptors

Benchmarking

Summary

2. Trees, Heaps, and Graphs

Introduction

Non-Linear Problems

Tree – It's Upside Down!

Variants of Trees

Heaps

Graphs

Summary

3. Hash Tables and Bloom Filters

Introduction

Hash Tables

Collisions in Hash Tables

C++ Hash Tables

Bloom Filters

Summary

4. Divide and Conquer

Introduction

Binary Search

Understanding the Divide-and-Conquer Approach

Sorting Using Divide and Conquer

C++ Standard Library Tools for Divide and Conquer

Dividing and Conquering at a Higher Abstraction Level – MapReduce

Summary

5. Greedy Algorithms

Introduction

Basic Greedy Algorithms

The Knapsack Problem(s)

The Vertex Coloring Problem

Summary

6. Graph Algorithms I

Introduction

The Graph Traversal Problem

Prim's MST Algorithm

Dijkstra's Shortest Path Algorithm

Summary

7. Graph Algorithms II

Introduction

Revisiting the Shortest Path Problem

The Bellman-Ford Algorithm

The Bellman-Ford Algorithm (Part II) – Negative Weight Cycles

Johnson's Algorithm

Strongly Connected Components

Kosaraju's Algorithm

Choosing the Right Approach

Summary

8. Dynamic Programming I

Introduction

What Is Dynamic Programming?

Memoization – The Top-Down Approach

Tabulation – the Bottom-Up Approach

Subset Sum Problem

Dynamic Programming on Strings and Sequences

Activity 21: Melodic Permutations

Summary

9. Dynamic Programming II

Introduction

An Overview of P versus NP

Reconsidering the Subset Sum Problem

The Knapsack Problem

Graphs and Dynamic Programming

Summary

Appendix

Chapter 1: Lists, Stacks, and Queues

Chapter 2: Trees, Heaps, and Graphs

Chapter 3: Hash Tables and Bloom Filters

Chapter 4: Divide and Conquer

Chapter 5: Greedy Algorithms

Chapter 6: Graph Algorithms I

Chapter 7: Graph Algorithms II

Chapter 8: Dynamic Programming I

Chapter 9: Dynamic Programming II

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Chapter 7: Graph Algorithms II

Activity 15: Greedy Robot

We can solve this activity using the exact algorithm from Exercise 33, Implementing the Bellman-Ford Algorithm (Part II). The potential pitfalls here are related to correctly interpreting the required task and representing the graph within the context of the problem you are actually trying to solve. Let's get started:

The first step will be identical to the exercise. We will include the same headers and define an Edge struct and an UNKNOWN constant:
#include <iostream>
#include <vector>
#include <climits>
using namespace std;
struct Edge
{
        int start;
        int end;
        int weight;
        Edge(int s, int e, int w) : start(s), end(e), weight(w) {}
};
const int UNKNOWN = INT_MAX;
vector<Edge*> edges;
In main(), we will declare an integer, N, which determines the height/width of the grid. We will then iterate from 0 to N * N - 1 in a for loop and read the adjacency data given in the input:
int main()
{
    int N;
    cin >> N;
    for(int i = 0; i < N * N - 1; i++)
    {
        string directions;
        int power;

        cin >> directions >> power;

        ……
    }
    return 0;
}
Now, we must face the first potential problem – accurately representing the adjacencies. Typically, we would be inclined to think of a grid in two dimensions, and while it would certainly be possible to solve the problem this way, it would not be the optimal approach for this particular problem. To reinterpret the grid and adjacencies in one dimension, we must simply observe the following relationships between the one-dimensional index, i, and the corresponding two-dimensional grid coordinates:
CURRENT CELL: (x, y) —> i
NORTH: (x, y - 1) —> i - N
SOUTH: (x, y + 1) —> i + N
EAST: (x + 1, y) —> i + 1
WEST: (x - 1, y) —> i - 1
We can handle these relationships by iterating through the characters of directions and containing the logic within a switch statement:
for(int i = 0; i < N * N - 1; i++)
{
    string directions;
    int power;
    cin >> directions >> power;
    int next;
    for(auto d : directions)
    {
        switch(d)
        {
            case 'N': next = i - N; break;
            case 'S': next = i + N; break;
            case 'E': next = i + 1; break;
            case 'W': next = i - 1; break;
        }
        ……
    }
}
This leads to the second problematic aspect of this activity; that is, the interpretation of the power values. These, of course, will be the values that define the edge weights between adjacent cells, but within the context of this problem, the inputs can be rather misleading. According to the problem's description, we want to find the path that reaches the end with the maximum amount of energy compared to the baseline. A careless reading of the problem statement may lead us to conclude that the power values correspond exactly to the edge weights, but this would actually produce the opposite of what we intend to achieve. "Maximizing energy" can be viewed as the equivalent to "minimizing energy loss," and since the negative values actually represent the energy expenditure for each cell and the positive values represent energy gained, we must reverse the sign of each power value:
for(auto d : directions)
{
    switch(d)
    {
        ……
    }
    // Add edge with power variable's sign reversed
    edges.push_back(new Edge(i, next, -power));
}
Now, we can implement BellmanFord(). This time, our function will take N and edges as arguments and return an integer equal to the maximum relative energy. To simplify our code, we will pass N as the total number of cells in the grid (that is, N * N):
int BellmanFord(int N, vector<Edge*> edges)
{
    vector<int> distance(N, UNKNOWN);

    // Starting node is always index 0
    distance[0] = 0;
    for(int i = 0; i < N - 1; i++)
    {
        for(auto edge : edges)
        {
            if(distance[edge->start] == UNKNOWN)
            {
                continue;
            }
            if(distance[edge->start] + edge->weight < distance[edge->end])
            {
                distance[edge->end] = distance[edge->start] + edge->weight;
            }
        }
    }
    ……
}
As per the standard implementation, we will also perform a check for negative cycles to handle the condition related to the robot's greedy energy consumption. In the case that a negative cycle is found, we will return UNKNOWN:
// Check for negative cycles
for(auto edge : edges)
{
    if(distance[edge->start] == UNKNOWN)
    {
        continue;
    }
    if(distance[edge->start] + edge->weight < distance[edge->end])
    {
        return UNKNOWN;
    }
}
return distance[N];
Now, we can perform a call to BellmanFord() in main() and handle the output accordingly:
int result = BellmanFord(N * N, edges);
(result == UNKNOWN) ? cout << "ABORT TRAVERSAL" << endl
: cout << -1 * result << endl;
return 0;

Activity 16: Randomized Graph Statistics

In this activity, we will generate randomized graphs for interview tests as described in the activity brief. Follow these steps to complete the activity:

Begin by including the following headers, as well as defining the UNKNOWN constant and the Edge struct:
#include <iostream>
#include <vector>
#include <iomanip>
#include <algorithm>
#include <queue>
#include <utility>
using namespace std;
const int UNKNOWN = 1e9;
struct Edge
{
    int u;
    int v;
    int w;
    Edge(int u, int v, int w)
        : u(u), v(v), w(w) {}
};
Our first task is to handle the generation of each graph. For this activity, we will encapsulate our graph data within a struct:
struct Graph
{
    int V, E;
    int maxWeight = -1e9;
    vector<Edge> edges;
    vector<vector<int>> adj;
    vector<vector<int>> weight;
    Graph(int v, int e) : V(v), E(e)
    {
        ...
    }
};
To make sure that the generated edges and the resulting graph are valid, we will create an adjacency matrix and check it during every attempt to create another edge. If an edge between the same two nodes already exists, we will begin another iteration. To make sure that every node has at least one incoming or outgoing edge, we will also set the diagonal cells in the matrix to true for each node that is part of an edge. If any of the diagonal cells are false after E edges are created, the graph will be invalid. We can indicate a graph as invalid by setting V to -1:
Graph(int v, int e) : V(v), E(e)
{
    vector<vector<bool>> used(V, vector<bool>(V, false));
    adj.resize(V);
    weight.resize(V, vector<int>(V, UNKNOWN));
    while(e)
    {
        // Generate edge values
        int u = rand() % V;
        int v = rand() % V;
        int w = rand() % 100;
        if(rand() % 3 == 0)
        {
            w = -w;
        }
        // Check if the edge is valid
        if(u == v || used[u][v])
        {
            continue;
        }
        // Add to edges and mark as used
        edges.push_back(Edge(u, v, w));
        adj[u].push_back(v);
        weight[u][v] = w;
        maxWeight = max(maxWeight, w);
        used[u][u] = used[v][v] = used[u][v] = used[v][u] = true;
        e--;
    }
    for(int i = 0; i < V; i++)
    {
        // Set V to -1 to indicate the graph is invalid
        if(!used[i][i])
        {
            V = -1;
            break;
        }
    }
}
Let's also define an enum called RESULT with the corresponding values for each type of graph we need to consider:
enum RESULT
{
    VALID,
    INVALID,
    INTERESTING
};
In main(), we will receive the input, as well as declare the counters for each type of graph. We will then loop through the given number of iterations, create a new graph, and call a TestGraph() function that takes a Graph object as input and returns RESULT. Depending on the value that's returned, we will increment each counter accordingly:
int main()
{
    unsigned int seed;
    int iterations, V, E;

    cin >> seed;
    cin >> iterations;
    cin >> V >> E;
    int invalid = 0;
    int valid = 0;
    int interesting = 0;
    srand(seed);
    while(iterations--)
    {
        Graph G(V, E);

        switch(TestGraph(G))
        {
            case INVALID: invalid++; break;
            case VALID: valid++; break;
            case INTERESTING:
            {
                valid++;
                interesting++;
                break;
            }
        }
    }

    return 0;
}
TestGraph() will first check whether the value of V for each graph is equal to -1 and return INVALID if so. Otherwise, it will perform Johnson's algorithm to retrieve the shortest distances. The first step will be to retrieve the reweighting array using the Bellman-Ford algorithm:
RESULT TestGraph(Graph G)
{
    if(G.V == -1)
    {
        return INVALID;
    }

    vector<int> distance = BellmanFord(G);
    ……
}
The implementation of Bellman-Ford that's used in this solution corresponds exactly to the one from the exercise, except that it receives a single Graph structure as an argument:
vector<int> BellmanFord(Graph G)
{
    vector<int> distance(G.V + 1, UNKNOWN);
    int s = G.V;
    for(int i = 0; i < G.V; i++)
    {
        G.edges.push_back(Edge(s, i, 0));
    }

    distance[s] = 0;
    for(int i = 0; i < G.V; i++)
    {
        for(auto edge : G.edges)
        {
            if(distance[edge.u] == UNKNOWN)
            {
                continue;
            }
            if(distance[edge.u] + edge.w < distance[edge.v])
            {
                distance[edge.v] = distance[edge.u] + edge.w;
            }
        }
    }
    for(auto edge : G.edges)
    {
        if(distance[edge.u] == UNKNOWN)
        {
            continue;
        }
        if(distance[edge.u] + edge.w < distance[edge.v])
        {
            return {};
        }
    }
return distance;
}
As we did in the exercise, we will check whether the vector that's returned by BellmanFord() is empty. If so, we return VALID (the graph is valid but uninteresting). Otherwise, we will follow through with the rest of Johnson's algorithm by reweighting the edges and performing a call to Dijkstra's algorithm for each vertex:
RESULT TestGraph(Graph G)
{
    if(G.V == -1)
    {
        return INVALID;
    }

    vector<int> distance = BellmanFord(G);
    if(distance.empty())
    {
        return VALID;
    }
    for(auto edge : G.edges)
    {
        G.weight[edge.u][edge.v] += (distance[edge.u] – distance[edge.v]);
    }
    double result = 0;
    for(int i = 0; i < G.V; i++)
    {
        vector<int> shortest = Dijkstra(i, G);
    }
}
For this solution, let's use a more efficient form of Dijkstra's algorithm, which uses a min-priority queue to determine traversal order. To do this, each value that's added to the queue must consist of two values: the node's index and its distance value. We will do this using std::pair<int, int>, which has been redefined here as State. When pushing elements to the queue, the first value must correspond to the distance since this is going to be the first value that's considered by the priority queue's internal ordering logic. All of this can be handled by std::priority_queue, but we will need to provide three template parameters corresponding to the data type, container, and comparison predicate, respectively:
vector<int> Dijkstra(int source, Graph G)
{
    typedef pair<int, int> State;
    priority_queue<State, vector<State>, greater<State>> Q;
    vector<bool> visited(G.V, false);
    vector<int> distance(G.V, UNKNOWN);
    Q.push({0, source});
    distance[source] = 0;
    while(!Q.empty())
    {
        State top = Q.top();
        Q.pop();
        int node = top.second;
        int dist = top.first;
        visited[node] = true;
        for(auto next : G.adj[node])
        {
            if(visited[next])
            {
                continue;
            }
            if(dist != UNKNOWN && distance[next] > dist + G.weight[node][next])
            {
                distance[next] = distance[node] + G.weight[node][next];

                Q.push({distance[next], next});
            }

        }
    }
    return distance;
}
Now, we will calculate the averages in TestGraph() for each set of paths. We do this by iterating through the array returned by Dijkstra() and keeping a sum of distances for which the index is not equal to the starting node's index. The corresponding value is not equal to UNKNOWN. Every time a valid distance is found, a counter is also incremented so that we can get the final average by dividing the sum by the count. Each one of these averages is then added to the total result, which is divided by the total number of vertices in the graph. Remember that we must reweight the distances again to get the correct values:
double result = 0;
for(int i = 0; i < G.V; i++)
{
    vector<int> shortest = Dijkstra(i, G);
    double average = 0;
    int count = 0;
    for(int j = 0; j < G.V; j++)
    {
        if(i == j || shortest[j] == UNKNOWN)
        {
            continue;
        }
        shortest[j] += (distance[j] – distance[i]);
        average += shortest[j];
        count++;
    }
    average = average / count;
    result += average;
}
result = result / G.V;
The last step is to calculate the ratio between the result and the maximum weight in the graph. If the value is less than 0.5, we return INTERESTING; otherwise, we return VALID:
double ratio = result / G.maxWeight;
return (ratio < 0.5) ? INTERESTING : VALID;
We can now return to main() and print the output. The first line will be equal to the value of invalid. The second line will be equal to interesting / valid, multiplied by 100, so that it will be displayed as a percentage. Depending on how you do this, you may have to cast your variables as floating points to prevent the value from being rounded to an integer. When printing the output, you can easily make sure it is rounded to two decimal places by using cout << fixed << setprecision(2):
double percentInteresting = (double)interesting / valid * 100;
cout << "INVALID GRAPHS: " << invalid << endl;
cout << "PERCENT INTERESTING: " << fixed << setprecision(2) << percentInteresting << endl;
return 0;

Activity 17: Maze-Teleportation Game

The entire activity conforms fairly closely to the standard implementations of the algorithms we've discussed in this chapter, but with a few slight modifications.

The terms that were used in the problem description, that is, maze, rooms, teleporters, and points could, of course, just as easily have been called graph, vertices, edges, and edge weights. The condition in which a player is able to infinitely reduce their score can be redefined as a negative weight cycle. Follow these steps to complete the activity:

Let's begin by including the necessary headers and setting up the variables and input for the activity:
#include <iostream>
#include <vector>
#include <stack>
#include <climits>
struct Edge
{
    int start;
    int end;
    int weight;
    Edge(int s, int e, int w) : start(s), end(e), weight(w) {}
}
const int UNKNOWN = INT_MAX;
vector<Edge*> edges; // Collection of edge pointers
We will receive input in the same form as our original Bellman-Ford implementation, but we will also build an adjacency list for our graph (represented here as a vector of integer vectors, adj):
int main()
{
    int V, E;
    cin >> V >> E;
    vector<Edge*> edges;
    vector<vector<int>> adj(V + 1);
    for(int i = 0; i < E; i++)
    {
        int u, v, w;
        cin >> u >> v >> w;
        edges.push_back(new Edge(u, v, w));
        adj[u].push_back(v);
    }
    vector<int> results;
The first portion of the problem can be solved by using Bellman-Ford in an identical fashion to what was outlined in Exercise 32, Implementing the Bellman-Ford Algorithm (Part I). However, instead of printing all the values in the distance array, we will set its return type to int and include a few extra lines of code so that it returns only the shortest distance from the source vertex (or UNKNOWN if a negative cycle is detected):
int BellmanFord(int V, int start, vector<Edge*> edges)
{
    // Standard Bellman-Ford implementation
    vector<int> distance(V, UNKNOWN);

    distance[start] = 0;
    for(int i = 0; i < V - 1; i++)
    {
        for(auto edge : edges)
        {
            if(distance[edge->start] == UNKNOWN)
            {
                continue;
            }
            if(distance[edge->start] + edge->weight < distance[edge->end])
            {
                distance[edge->end] = distance[edge->start] + edge->weight;
            }
        }
    }
    // Return UNKNOWN if a negative cycle is found
    if(HasNegativeCycle(distance, edges))
    {
        return UNKNOWN;
    }
    int result = UNKNOWN;
    for(int i = 0; i < V; i++)
    {
        if(i == start) continue;
        result = min(result, distance[i]);
    }
    return result;
}
We can now call this function in main() and populate a results vector for output. If BellmanFord() happens to return UNKNOWN, we output INVALID MAZE and terminate the program (as per the first condition). If a certain starting node has no outgoing edges, we can skip the call to BellmanFord entirely and simply append UNKNOWN to the vector. If we make it through every vertex, we can output the values in the results (or DEAD END if the value is UNKNOWN):
vector<int> results;
for(int i = 0; i < V; i++)
{
    if(adj[i].empty())
    {
        results.push_back(UNKNOWN);
        continue;
    }
    int shortest = BellmanFord(V, i, edges);
    if(shortest == UNKNOWN)
    {
        cout << "INVALID MAZE" << endl;
        return 0;
    }
    results.push_back(shortest);
}
for(int i = 0; i < V; i++)
{
    cout << i << ": ";
    (results[i] == INVALID) ? cout << "DEAD END" << endl : cout << results[i] << endl;
}
Now, we've come to the final condition – finding rooms in which players can get "stuck." Considering this case in terms of graph connectivity, we can redefine it as follows: find the strongly connected components that have no outgoing edges to other components. There are many simple ways to do this once all the strongly connected components have been acquired, but let's try to maximize our program's efficiency and add the necessary logic directly into our existing Kosaraju implementation.
To accomplish this, we will declare two new vectors: one of type bool, named isStuck and another of type int, named inComponent. inComponent will store the index of the component each node belongs to, while isStuck will tell us whether or not the component with index i is cut off from the rest of the graph.
For the sake of simplicity, let's declare the new variables globally:
vector<bool> isStuck;
vector<int> inComponent;
int componentIndex;
Here, we can really begin to appreciate the benefits of encapsulation and object-oriented implementations of graph structures. Having to pass such a large amount of data between our functions is not only difficult to keep track of mentally, but it greatly complicates any kind of modifications we may want to make in the future (to say nothing about the headache-inducing appearance of a function call such as GetComponent(node, adj, visited, component, isStuck, inComponent, componentIndex). For the sake of example and readability, we opt to declare this data globally, but this sort of approach is highly recommended against within the context of an actual full-scale application.
Within our Kosaraju function, we initialize the new data as follows:
isStuck.resize(V, true);
inComponent.resize(V, UNKNOWN);
componentIndex = 0;
Now, we will begin our while loop, incrementing componentIndex by following each DFS traversal that's performed on the stack:
while(!stack.empty())
{
    int node = stack.top();
    stack.pop();
    if(!visited[node])
    {
        vector<int> component;
        GetComponent(node, transpose, visited, component);
        components.push_back(component);
        componentIndex++;
    }
}
Now, we can write the logic in GetComponent(), which will handle this case. We will begin by setting the value of each node's index in inComponent to componentIndex. Now, as we iterate through each node's neighbors, we will include another condition that occurs when the nodes have already been visited:
component.push_back(node);
visited[node] = true;
inComponent[node] = componentIndex;
for(auto next : adj[node])
{
    if(!visited[next])
    {
        GetComponent(next, visited, adj, component);
    }
    else if(inComponent[node] != inComponent[next])
    {
        isStuck[inComponent[next]] = false;
    }
}
Essentially, we are checking to see whether each previously visited neighbor's component matches the current node's component. If their respective component IDs are different, we can conclude that the neighbor's component has a path that extends to other parts of the graph.
You may be wondering why, in a directed graph, the existence of an edge from the current node indicates that the neighboring node has an outgoing path outside of its own component. The reason this logic seems 'backward' is because it is. Remember that we are traversing the transform of the original graph, so the directions between adjacencies are all reversed!
Upon finishing the DFS traversals, we can now return the components vector and print the results:
auto components = Kosaraju(V, adj);
for(int i = 0; i < components.size(); i++)
{
    if(isStuck[i])
    {
        for(auto node : components[i])
        {
            cout << node << " ";
        }
        cout << endl;
    }
}
return 0;

C++ Data Structures and Algorithm Design Principles

By : John Carey, Anil Achary, Shreyans Doshi, Payas Rajan

C++ Data Structures and Algorithm Design Principles

By: John Carey, Anil Achary, Shreyans Doshi, Payas Rajan

Overview of this book

Related Content you might be interested in

Current Title:

C++ Data Structures and Algorithm Design Principles

Beginning Java Data Structures and Algorithms

PHP 7 Data Structures and Algorithms

Hands-On Data Structures and Algorithms with Python &ndash; Third Edition

Chapter 7: Graph Algorithms II

Activity 15: Greedy Robot

Activity 16: Randomized Graph Statistics

Activity 17: Maze-Teleportation Game

Hands-On Data Structures and Algorithms with Python – Third Edition