Book Image

R Data Structures and Algorithms

By : PKS Prakash, Achyutuni Sri Krishna Rao
Book Image

R Data Structures and Algorithms

By: PKS Prakash, Achyutuni Sri Krishna Rao

Overview of this book

In this book, we cover not only classical data structures, but also functional data structures. We begin by answering the fundamental question: why data structures? We then move on to cover the relationship between data structures and algorithms, followed by an analysis and evaluation of algorithms. We introduce the fundamentals of data structures, such as lists, stacks, queues, and dictionaries, using real-world examples. We also cover topics such as indexing, sorting, and searching in depth. Later on, you will be exposed to advanced topics such as graph data structures, dynamic programming, and randomized algorithms. You will come to appreciate the intricacies of high performance and scalable programming using R. We also cover special R data structures such as vectors, data frames, and atomic vectors. With this easy-to-read book, you will be able to understand the power of linked lists, double linked lists, and circular linked lists. We will also explore the application of binary search and will go in depth into sorting algorithms such as bubble sort, selection sort, insertion sort, and merge sort.
Table of Contents (17 chapters)
R Data Structures and Algorithms
Credits
About the Authors
Acknowledgments
About the Reviewer
www.PacktPub.com
Preface

First class functions in R


R is primarily a functional language at its core. In R, functions are treated just like any other data types, and are considered as first-class citizens. The following example shows that R considers everything as a function call.

Here, the operator + is a function in itself:

> 10+20
[1] 30
> "+"(10,20)
[1] 30

Here, the operator ^ is also a function in itself:

> 4^2
[1] 16
> "^"(4,2)
[1] 16


Now, let's dive deep into functional concepts, which are crucial and widely used by R programmers.

Vectorized functions are among the most popular functional concepts which enable the programmer to execute functions at an individual element level for a given vector. This vector can also be a part of dataframe, matrix, or a list. Let's understand this in detail using the following example, in which we would like to have an operation on each element in a given vector V_in. The operation is to square each element within the vector and output it as vector V_out. We will implement them using three approaches as follows:

Approach 1: Here, the operations will be performed at the element level using a for loop. This is the most primitive of all the three approaches in which vector allocation is being performed using the style of S language:

> V_in <- 1:100000         ## Input Vector
> V_out <- c()             ## Output Vector
> for(i in V_in)           ## For loop on Input vector
+ {
+   V_out <- c(V_out,i^2)  ## Storing on Output vector
+ }

Approach 2: Here, the vectorized functional concept will be used to obtain the same objective. The loops in vectorized programming are implemented in C language, and hence, perform much faster than for loops implemented in R (Approach 1). The time elapsed to run this operation is instantaneous:

> V_in <- 1:100000    ## Input Vector
> V_out <- V_in^2     ## Output Vector

Approach 3: Here, higher order functions (or nested functions) are used to obtain the same objective. As functions are considered first class citizens in R, these can be called as an argument within another function. The widely used nested functions are in the apply family. The following table provides a summary of the various types of functions within the apply family:

Table 1.4 Various types of functions in the apply family

Now, lets' evaluate the first class function through examples. An apply function can be applied to a dataframe, matrix, or array. Let's illustrate it using a matrix:

> x <- cbind(x1 = 7, x2 = c(7:1, 2:5))
> col.sums <- apply(x, 2, sum)
> row.sums <- apply(x, 1, sum)

The lapply is a first class function to be applied to a vector, list, or variables in a dataframe or matrix. An example of lapply is shown below:

> x <- list(x1 = 7:1, x2 = c(7:1, 2:5))
> lapply(x, mean)

The use of the sapply function for a vector input using customized function is shown below:

> V_in <- 1:100000 ## Input Vector
> V_out <- sapply(V_in,function(x) x^2) ## Output Vector

The function mapply is a multivariate sapply. The mapply function is the first input, followed by input parameters as shown below:

mapply(FUN, ..., MoreArgs = NULL, SIMPLIFY = T, USE.NAMES = T)

An example of mapply to replicated two vector can be obtained as:

> mapply(rep, 1:6, 6:1)

The function call rep function in R with input from 1 to 6 and is replicated as 6 to 1 using the second dimension of the mapply function. The tapply applies a function to each cell of the ragged array. For example, let's create a list with a multiple array:

The output is a relationship between two vectors with position as a value. The function rapply is a recursive function for lapply as shown below:

> X <- list(list(a = pi, b = list(c = 1:1)), d = "a test")
> rapply(X, sqrt, classes = "numeric", how = "replace")

The function applies sqrt to all numeric classes in the list and replace it with new values.