Book Image

Mastering Scientific Computing with R

Book Image

Mastering Scientific Computing with R

Overview of this book

Table of Contents (17 chapters)
Mastering Scientific Computing with R
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Functions


Functions are bits of code that perform a particular task and print or return its output to an object. Writing functions are particularly useful to avoid rewriting code over and over in your program; instead, you can write a function and every time you would like to perform that particular task, you can call that function. In fact, all the code we used so far in our examples call built-in or third-party R package functions.

For example, we ask for the mean of x using the following code:

> x <- c(2, 6, 7, 12)
> mean(x)
[1] 6.75

In the preceding code, we are actually asking R to call the mean() function. Each function takes arguments. If you would like to know what arguments could be passed to a particular R function, you can consult the help page. There are several ways to access the help documentation in R. First, you can use the help() function as follows:

> help(mean)
Description
Generic function for the (trimmed) arithmetic mean.
Usage
mean(x, ...)

## Default S3 method:
mean(x, trim = 0, na.rm = FALSE, ...)
Arguments
x  An R object. Currently there are methods for numeric/logical vectors and date, date-time, and time interval objects. Complex vectors are allowed for trim = 0, only.
trim  the fraction (0 to 0.5) of observations to be trimmed from each end of x before the mean is computed. Values of trim outside that range are taken as the nearest endpoint.
na.rm  a logical value indicating whether NA values should be stripped before the computation proceeds.
... further arguments passed to or from other methods.
[…] 

Alternatively, you can use the ? symbol to obtain the documentation page for the mean function as follows:

> ?mean #Returns the same output as above

Alternatively, you may also want to search all the help topics as shown in the following screenshot for the mean word with the ?? symbol as follows:

> ??mean

As you can see in the preceding screenshot, R returns a table of all the search results matching the word "mean" for all the packages you have installed on your computer.

The help page is very useful because it tells you what type of object the function takes as input and a list of all the arguments it takes. The help page also informs you of the default settings used for all the arguments the function takes. By consulting the help page for the mean() function, you learn that the default settings are trim=0 and na.rm=FALSE. With trim set to 0, no observations or values are removed prior to calculating the mean, and with na.rm set to FALSE, all NA entries are not removed before calculating the mean. Consider the following example:

> x <- c(2, 6, 7, 12, NA, NA)
> mean(x)
[1] NA

If we specify na.rm=TRUE, the NA entries are ignored as follows:

> mean(x, na.rm=TRUE)
[1] 6.75

So far, we have been changing default parameters by explicitly specifying which arguments to change, that is, na.rm=TRUE. However, R also allows you to change default parameters using the argument position only. This means we can rewrite the last command as follows:

> #notice "," is used to specify unchanged missing arguments in the order they appear in the function definition on the help page
> mean(x, ,TRUE) 
[1] 6.75

This also holds true for the functions you may write as well. Let's write a simple function called vectorContains() to test whether a vector contains the number 3. To define a function in R, we write the word function and our list of arguments contained in parenthesis () followed by curly braces that contains the sequence of commands we want our function to execute. For example, let's write a function to check whether the value 3 is present in a vector. Here are the steps we will take to write a function to check whether a value (in this case, 3) is present in an input vector:

  1. We create a function called vectorContains and use an argument (variable) value.to.check to store the value we want to check.

  2. We check that the input object type is numeric using the is.numeric() function.

  3. We ensure that there are no missing (NA) values using the any() and is.na() functions. The any() function will check each entry and the is.na() function will return TRUE if NA is present. Because we want to return TRUE when there is no NA present instead of when an NA is present, we use the ! sign before the any(is.na()) command.

  4. We use an if else {} statement to return an error message if the vector isn't numeric and/or contains NA values using the stop() function.

  5. We create an object value.found to keep track of whether the value to be checked is found. We initially set value.found to FALSE because we assume the value is not present.

  6. We check each value of our input vector using a for() loop. If an element (i) of our vector matches value.to.check, we set value.found to "yes" and break out of the for() loop.

  7. Depending on whether value.found is set to "yes" or "no", we return TRUE or FALSE as follows:

    > vectorContains <- function(v1, value.to.check=3){
        if(is.numeric(v1) && !any(is.na(v1))) {
        value.found <- "no" 
        for (i in v1){
          if(i == value.to.check) { 
            value.found <- "yes"
            break 
          }
        }
        if(value.found == "yes") {
          return(TRUE)
        } else {
          return(FALSE)
        }
      } else {
    #When it exits the function it will print the following error message
        stop("This function takes a numeric vector without NAs as input.")
      }
    }

Now, let's test our function as follows:

> x <- c(2, 6, 7, 12, NA, NA)

> vectorContains(x)
Error in vectorContains(x) : 
  This function takes a numeric vector without NAs as input.
> y <- c(1, 4, 6, 8, 3, 12, 15)
> vectorContains(y)
[1] TRUE

Suppose we want to test whether a vector contains the value 6 instead of 3, we can easily change the default value.to.check from 3 to 6, as follows:

> vectorContains(y, 6) 
[1] TRUE
> vectorContains(y, value.to.check=17) 
[1] FALSE

Hopefully, in the preceding example, you can see that the beauty of writing functions instead of individual commands because you can reuse this function to check whether a vector contains any particular value. Moreover, by saving these lines of code to a text document (for example, vectorfunction.R), you can reload this function in a later session using the source() command instead of rewriting the function, as follows:

> source("/PathToFile/vectorfunction.R")