Book Image

R Data Structures and Algorithms

By : PKS Prakash, Achyutuni Sri Krishna Rao
Book Image

R Data Structures and Algorithms

By: PKS Prakash, Achyutuni Sri Krishna Rao

Overview of this book

In this book, we cover not only classical data structures, but also functional data structures. We begin by answering the fundamental question: why data structures? We then move on to cover the relationship between data structures and algorithms, followed by an analysis and evaluation of algorithms. We introduce the fundamentals of data structures, such as lists, stacks, queues, and dictionaries, using real-world examples. We also cover topics such as indexing, sorting, and searching in depth. Later on, you will be exposed to advanced topics such as graph data structures, dynamic programming, and randomized algorithms. You will come to appreciate the intricacies of high performance and scalable programming using R. We also cover special R data structures such as vectors, data frames, and atomic vectors. With this easy-to-read book, you will be able to understand the power of linked lists, double linked lists, and circular linked lists. We will also explore the application of binary search and will go in depth into sorting algorithms such as bubble sort, selection sort, insertion sort, and merge sort.
Table of Contents (17 chapters)
R Data Structures and Algorithms
Credits
About the Authors
Acknowledgments
About the Reviewer
www.PacktPub.com
Preface

Introduction to data structure


Moore's law in 1965 observed that the number of transistors per square inch in a dense integrated circuit (IC) had doubled every year since its invention, thus enhancing computational power per computer. He revised his forecast in 1975, stating that the number of transistors would double every 2 years, instead of every year, due to saturation:

Figure 1.1: Moore's law (Ref: data credit - Transistor count, Wikipedia)

Also, although the computational power has been increasing, problem complexity and data sources have also been increasing exponentially over the decade, enforcing the need for efficient algorithms:

Figure 1.2: Increase in size of unstructured data (Ref: Enterprise strategy group 2010)

This explosion in data from 2008 to 2015 has led to a new field of data science where people put in a lot of effort to derive insights using all kinds of datasets such as structured, semi-structured, and unstructured. Thus, to efficiently deal with data scalability, it is important to efficiently store and retrieve datasets. For example, searching for a word in a dictionary would take a long time if the data is randomly organized; thus, sorted list data structures are utilized to ensure a faster search of words. Similarly, searching for an optimal path in a city based on the input location requires data about road network, position, and so on to be stored in the form of geometries. Ideally, even a variable stored as any built-in data type such as character, integer, or float can be considered as a data structure of scalar nature. However, data structure is formally defined as a scheme of organizing related information in a computer so that it can be used efficiently.

Similarly, for algorithms, given sufficient space and time, any dataset can be stored and processed to answer questions of interest. However, selecting the correct data structure will help you save a lot of memory and computational power. For example, assigning a float to integer data type, such as the number of customers coming in a day, will require double the memory. However, in the real-world scenario, computers are constrained by computational resources and space. Thus, a solution can be stated as efficient if it is able to achieve the desired goal in the given resources and time. Thus, this could be used as a cost function to compare the performance of different data structures while designing algorithms. There are two major design constraints to be considered while selecting the data structure:

  • Analyze the problem to decide on the basic operation that must be supported into the selected data structure, such as inserting an item, deleting an item, and searching for an item

  • Evaluate the resource constraint for each operation

The data structure is selected depending on the problem scenario, for example, a case where complete data is loaded at the beginning, and there is no change/insertion in data, requires similar data structure. However, similar including deletion into data structure will make data structure implementation more complex.

Tip

Detailed steps to download the code bundle are mentioned in the Preface of this book. Please have a look. The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/R-Data-Structures-and-Algorithms. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!