Book Image

R High Performance Programming

Book Image

R High Performance Programming

Overview of this book

Table of Contents (17 chapters)
R High Performance Programming
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Algorithm design affects time and space complexity


There is one other performance factor that we have not discussed—your code. The types of computations and algorithms that you run can have a huge impact on performance. Computer scientists describe the performance characteristics of programs in terms of complexity. In particular, we are concerned about two types of complexities:

  • Time complexity: This refers to the computing time required to run an R program in relation to the size of the data being processed

  • Space complexity: This refers to the memory that is required to run an R program in relation to the size of the data being processed

Let's look at an example of time complexity. Suppose that we need to write a function to compute the nth Fibonacci number, that is, a number in the sequence 0, 1, 1, 2, 3, 5, 8, 13, … where each number is the sum of the previous two numbers. A simple way to do this would be to write a recursive function such as:

fibonacci_rec <- function(n) {
    if (n <= 1) {
        return(n)
    }
    return(fibonacci_rec(n - 1) + fibonacci_rec(n - 2))
}

Since the nth Fibonacci number is the sum of the (n-1)th and (n-2)th Fibonacci numbers, this function simply calls itself to compute the previous two numbers, then adds them up. Let's see how long it takes to compute the 25th Fibonacci number using the microbenchmark() function from the microbenchmark package, which can be downloaded and installed from CRAN (we will take a closer look at how to use this function in Chapter 2, Measuring Code's Performance):

microbenchmark(fibonacci_rec(25), unit = "ms")
## Unit: milliseconds
##               expr      min    lq     mean   median       uq
##  fibonacci_rec(25) 170.1014 179.8 191.4213 183.5275 197.5833
##       max neval
##  253.1433   100

It took a median of 184 milliseconds. Because of the way the recursion works, there is a lot of unnecessary repetition. For example, to compute the 25th Fibonacci number, we need to compute the 23rd and 24th numbers in the sequence. But, computing the 24th number also involves computing the 23rd number, so the 23rd number is computed twice. And the 22nd number is needed to compute both the 23rd and 24th numbers, and so on.

We can reduce this repetition by computing each number only once. The following code presents an alternative implementation of the Fibonacci function that does just that. It computes the Fibonacci numbers in sequence from smallest to largest and remembers the numbers that it has computed in the numeric vector fib. Thus, each Fibonacci number is computed only once:

fibonacci_seq <- function(n) {
    if (n <= 1) {
        return(n)
    }
    # (n+1)th element of this vector is the nth Fibonacci number
    fib <- rep.int(NA_real_, n + 1)
    fib[1] <- 0
    fib[2] <- 1
    for (i in 2:n) {
        fib[i + 1] <- fib[i] + fib[i - 1]
    }
    return(fib[n + 1])
}

Tip

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

By benchmarking this sequential function, we see that it takes a median of 0.04 milliseconds to run, a reduction of 99.98 percent from the recursive version!

microbenchmark(fibonacci_seq(25), unit = "ms")
## Unit: milliseconds
##               expr     min       lq      mean    median      uq
##  fibonacci_seq(25) 0.03171 0.036133 0.0446416 0.0405555 0.04459
##                        max neval
##                   0.114714   100

To demonstrate the concept of time complexity, we ran the benchmark for different values of n ranging from 0 to 50. The median execution times are shown in the following figure:

Execution time of recursive versus sequential versions of Fibonacci function

As we increase the value of n, the execution time of the recursive version of the Fibonacci function increases exponentially. It is roughly proportional to 1.6n—every time n increases by 1, it gets multiplied by about 1.6 times. The execution time increased so fast that it took too long to compute the Fibonacci numbers after the 50th one. On the other hand, though it is imperceptible from the chart, the execution time of the sequential version increases linearly—every increase in n increases the execution time by 1.3 microseconds. Since the computational complexity of the sequential version is much lower than that of the recursive version, it will perform much better as n increases. As a case in point, with a modest value of n=50, the sequential version took a fraction of a millisecond to get computed while the recursive version took over eight hours!

Though we will not do it here, a similar exercise can be conducted in order to compare the space complexity of different algorithms. Given a certain amount of computational resources, your choice of algorithm and the design of your code can have a big impact on your R program's ability to achieve the desired level of performance.