Book Image

C++ Data Structures and Algorithm Design Principles

By : John Carey, Anil Achary, Shreyans Doshi, Payas Rajan
Book Image

C++ Data Structures and Algorithm Design Principles

By: John Carey, Anil Achary, Shreyans Doshi, Payas Rajan

Overview of this book

C++ is a mature multi-paradigm programming language that enables you to write high-level code with a high degree of control over the hardware. Today, significant parts of software infrastructure, including databases, browsers, multimedia frameworks, and GUI toolkits, are written in C++. This book starts by introducing C++ data structures and how to store data using linked lists, arrays, stacks, and queues. In later chapters, the book explains the basic algorithm design paradigms, such as the greedy approach and the divide-and-conquer approach, which are used to solve a large variety of computational problems. Finally, you will learn the advanced technique of dynamic programming to develop optimized implementations of several algorithms discussed in the book. By the end of this book, you will have learned how to implement standard data structures and algorithms in efficient and scalable C++ 14 code.
Table of Contents (11 chapters)

Chapter 4: Divide and Conquer

Activity 8: Vaccinations

In this activity, we will store and lookup the vaccination status of students to determine if they need to be vaccinated. These steps should help you complete the activity:

  1. Begin by including the following headers:

    #include <iostream>

    #include <vector>

    #include <chrono>

    #include <random>

    #include <algorithm>

    #include <numeric>

  2. Define the Student class as follows:

    class Student

    {

    private:

        std::pair<int, int> name;

        bool vaccinated;

    public:

        // Constructor

        Student(std::pair<int, int> n, bool v) :

            name(n), vaccinated(v)

        {}

        // Getters

        auto get_name() { return name; }

        auto is_vaccinated() { return vaccinated; }

        // Two people are same if they have the same name

        bool operator ==(const Student& p) const

        {

            return this->name == p.name;

        }

        // The ordering of a set of people is defined by their name

        bool operator< (const Student& p) const

        {

            return this->name < p.name;

        }

        bool operator> (const Student& p) const

        {

            return this->name > p.name;

        }

    };

  3. The following function lets us generate a student from random data:

    auto generate_random_Student(int max)

    {

        std::random_device rd;

        std::mt19937 rand(rd());

        // the IDs of Student should be in range [1, max]

        std::uniform_int_distribution<std::mt19937::result_type> uniform_dist(1, max);

        // Generate random credentials

        auto random_name = std::make_pair(uniform_dist(rand), uniform_dist(rand));

        bool is_vaccinated = uniform_dist(rand) % 2 ? true : false;

        return Student(random_name, is_vaccinated);

    }

  4. The following code is used to run and test the output of our implementation:

    void search_test(int size, Student p)

    {

        std::vector<Student> people;

        // Create a list of random people

        for (auto i = 0; i < size; i++)

            people.push_back(generate_random_Student(size));

        std::sort(people.begin(), people.end());

        // To measure the time taken, start the clock

        std::chrono::steady_clock::time_point begin = std::chrono::steady_clock::now();

        bool search_result = needs_vaccination(p, people);

        // Stop the clock

        std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now();

        std::cout << "Time taken to search = " <<

            std::chrono::duration_cast<std::chrono::microseconds>

            (end - begin).count() << " microseconds" << std::endl;

        if (search_result)

            std::cout << "Student (" << p.get_name().first

    << " " << p.get_name().second << ") "

                << "needs vaccination." << std::endl;

        else

            std::cout << "Student (" << p.get_name().first

    << " " << p.get_name().second << ") "

                << "does not need vaccination." << std::endl;

    }

  5. The following function implements our logic for whether a vaccination is needed:

    bool needs_vaccination(Student P, std::vector<Student>& people)

    {

        auto first = people.begin();

        auto last = people.end();

        while (true)

        {

            auto range_length = std::distance(first, last);

            auto mid_element_index = std::floor(range_length / 2);

            auto mid_element = *(first + mid_element_index);

            // Return true if the Student is found in the sequence and

    // he/she's not vaccinated

            if (mid_element == P && mid_element.is_vaccinated() == false)

                return true;

            else if (mid_element == P && mid_element.is_vaccinated() == true)

                return false;

            else if (mid_element > P)

                std::advance(last, -mid_element_index);

            if (mid_element < P)

                std::advance(first, mid_element_index);

            // Student not found in the sequence and therefore should be vaccinated

            if (range_length == 1)

                return true;

        }

    }

  6. Finally, the driver code is implemented as follows:

    int main()

    {

        // Generate a Student to search

        auto p = generate_random_Student(1000);

        search_test(1000, p);

        search_test(10000, p);

        search_test(100000, p);

        return 0;

    }

    Note

    Since we are randomizing values in step 3, your output may vary from the expected output shown for this activity.

Activity 9: Partial Sorting

The partial quicksort is only a slight modification of the original quicksort algorithm that was demonstrated in Exercise 20, Quicksort. Compared to that exercise, only step 4 is different. The following is a reference implementation:

  1. Add the following header files:

    #include <iostream>

    #include <vector>

    #include <chrono>

    #include <random>

    #include <algorithm>

  2. Next, we shall implement the partition operation, as follows:

    template <typename T>

    auto partition(typename std::vector<T>::iterator begin,

        typename std::vector<T>::iterator end)

    {

        auto pivot_val = *begin;

        auto left_iter = begin + 1;

        auto right_iter = end;

        while (true)

        {

            // Starting from the first element of vector,

            // find an element that is greater than pivot.

            while (*left_iter <= pivot_val && std::distance(left_iter, right_iter) > 0)

                left_iter++;

            // Starting from the end of vector moving to the beginning,

            // find an element that is lesser than the pivot.

            while (*right_iter > pivot_val && std::distance(left_iter, right_iter) > 0)

                right_iter--;

            // If left and right iterators meet, there are no elements left to swap.

            // Else, swap the elements pointed to by the left and right iterators

            if (left_iter == right_iter)

                break;

            else

                std::iter_swap(left_iter, right_iter);

        }

        if (pivot_val > *right_iter)

            std::iter_swap(begin, right_iter);

        return right_iter;

    }

  3. Since the desired output also needs an implementation of the quicksort algorithm, we'll implement one as follows:

    template <typename T>

    void quick_sort(typename std::vector<T>::iterator begin,

        typename std::vector<T>::iterator last)

    {

        // If there are more than 1 elements in the vector

        if (std::distance(begin, last) >= 1)

        {

            // Apply the partition operation

            auto partition_iter = partition<T>(begin, last);

            // Recursively sort the vectors created by the partition operation

            quick_sort<T>(begin, partition_iter-1);

            quick_sort<T>(partition_iter, last);

        }

    }

  4. Implement the partial quicksort function as follows:

    template <typename T>

    void partial_quick_sort(typename std::vector<T>::iterator begin,

        typename std::vector<T>::iterator last,

        size_t k)

    {

        // If there are more than 1 elements in the vector

        if (std::distance(begin, last) >= 1)

        {

            // Apply the partition operation

            auto partition_iter = partition<T>(begin, last);

            // Recursively sort the vectors created by the partition operation

            partial_quick_sort<T>(begin, partition_iter-1, k);

            

            // Sort the right subvector only if the final position of pivot < k

            if(std::distance(begin, partition_iter) < k)

                partial_quick_sort<T>(partition_iter, last, k);

        }

    }

  5. The following helper functions can be then used to print the contents of a vector and to generate a random vector:

    template <typename T>

    void print_vector(std::vector<T> arr)

    {

        for (auto i : arr)

            std::cout << i << " ";

        std::cout << std::endl;

    }

    // Generates random vector of a given size with integers [1, size]

    template <typename T>

    auto generate_random_vector(T size)

    {

        std::vector<T> V;

        V.reserve(size);

        std::random_device rd;

        std::mt19937 rand(rd());

        // the IDs of Student should be in range [1, max]

        std::uniform_int_distribution<std::mt19937::result_type> uniform_dist(1, size);

        for (T i = 0; i < size; i++)

            V.push_back(uniform_dist(rand));

        return std::move(V);

    }

  6. The following function implements the testing logic for our sorting functions:

    // Sort the first K elements of a random vector of a given 'size'

    template <typename T>

    void test_partial_quicksort(size_t size, size_t k)

    {

            // Create two copies of a random vector to use for the two algorithms

            auto random_vec = generate_random_vector<T>(size);

            auto random_vec_copy(random_vec);

            std::cout << "Original vector: "<<std::endl;

            print_vector<T>(random_vec);

                

            // Measure the time taken by partial quick sort

            std::chrono::steady_clock::time_point

    begin_qsort = std::chrono::steady_clock::now();

            partial_quick_sort<T>(random_vec.begin(), random_vec.end()-1, k);

            std::chrono::steady_clock::time_point

    end_qsort = std::chrono::steady_clock::now();

        

            std::cout << std::endl << "Time taken by partial quick sort = "

                << 'std::chrono::duration_cast<std::chrono::microseconds>

                (end_qsort - begin_qsort).count()

                << " microseconds" << std::endl;

        

            std::cout << "Partially sorted vector (only first "<< k <<" elements):";

            print_vector<T>(random_vec);

            // Measure the time taken by partial quick sort

            begin_qsort = std::chrono::steady_clock::now();

            quick_sort<T>(random_vec_copy.begin(), random_vec_copy.end()-1);

            end_qsort = std::chrono::steady_clock::now();

            std::cout << std::endl <<"Time taken by full quick sort = "

                << std::chrono::duration_cast<std::chrono::microseconds>

                (end_qsort - begin_qsort).count()

                << " microseconds" << std::endl;

        

            std::cout << "Fully sorted vector: ";

            print_vector<T>(random_vec_copy);

    }

  7. Finally, add the driver code, as follows:

    int main()

    {

        test_partial_quicksort<unsigned>(100, 10);

        return 0;

    }

Activity 10: Implementing WordCount in MapReduce

In this activity, we will implement the MapReduce model to solve the WordCount problem. The following is the solution to this activity:

  1. Implement the map task as follows:

    struct map_task : public mapreduce::map_task<

        std::string,                             // MapKey (filename)

        std::pair<char const*, std::uintmax_t>>  // MapValue (memory mapped file contents)

    {

        template<typename Runtime>

        void operator()(Runtime& runtime, key_type const& key, value_type& value) const

        {

            bool in_word = false;

            char const* ptr = value.first;

            char const* end = ptr + value.second;

            char const* word = ptr;

            // Iterate over the contents of the file, extract words and emit a <word,1> pair.

            for (; ptr != end; ++ptr)

            {

                // Convert the character to upper case.

                char const ch = std::toupper(*ptr, std::locale::classic());

                if (in_word)

                {

                    if ((ch < 'A' || ch > 'Z') && ch != '\'')

                    {

    runtime.emit_intermediate(std::pair<char const*,

                  std::uintmax_t> (word, ptr - word), 1);

                        in_word = false;

                    }

                }

                else if (ch >= 'A' && ch <= 'Z')

                {

                    word = ptr;

                    in_word = true;

                }

            }

            // Handle the last word.

            if (in_word)

            {

                assert(ptr > word);

                runtime.emit_intermediate(std::pair<char const*,

                              std::uintmax_t>(word, ptr - word), 1);

            }

        }

    };

    The preceding map function is applied separately to each file in the input directory. The contents of the input file are accepted as the * character in value. The inner loop then iterates over the contents of the file, extracting different words and emitting < key, value > pairs, where key is a word and value is set to 1.

  2. Implement the reduce task as follows:

    template<typename KeyType>

    struct reduce_task : public mapreduce::reduce_task<KeyType, unsigned>

    {

        using typename mapreduce::reduce_task<KeyType, unsigned>::key_type;

        template<typename Runtime, typename It>

        void operator()(Runtime& runtime, key_type const& key, It it, It const ite) const

        {

            runtime.emit(key, std::accumulate(it, ite, 0));    

    }

    };

    The reduce operation can then be applied to all < key, value > pairs that are emitted by the map function. Since the value was set to 1 in the previous step, we can now use std::accumulate() to get the total number of times a key appears among the input pairs of the reduce operation.