Looping allows us to do repetitive task in a couple of lines of code, saving us much effort and time. Functions allow us to write a block of instructions that could be modified to work according to the way they are being called. Combining the power of looping, functions, and apply family in R allows us to loop through the elements of a data type, or similar, and apply a function or use a block of instructions on each of these.

# Looping, functions, and apply family in R

# Looping in R

Suppose we want to loop through all the values of the `aug_price` column inside `all_prices4` and square them and return them. We can do so in the following way:

jan = all_prices4$jan_price

for(price in jan){

print(price^2)

}

This prints a square of all the prices in January as follows:

# Functions in R

We can also achieve the previous result by using a function. Let's name this function `square`:

square = function(data){

for(price in data){

print(price^2)

}

}

Now call the function as follows:

square(all_prices4$jan_price)

The following output also shows the squared price of `jan_price`:

Now suppose we want to have the ability to take elements to any power, not just `square`. We can attain it by making a little tweak to the function:

power_function = function(data, power){

for(price in data){

print(price^power)

}

}

Now suppose we want to take the power of `4` for the price in June, we can do the following:

power_function(all_prices4$june_price, 4)

We can see that the `june_price` column is taken to the fourth power as follows:

# Apply family – lapply, sapply, apply, tapply

We discuss apply family here, which allows us not to have to write loops and reduces our workload. We will discuss four functions under this family: apply, lapply, sapply, and tapply.

# apply

`apply` works on arrays or matrices and gives us an easier way to compute something row-wise or column-wise. For the `apply()` function, this row- or column-wise consideration is denoted by a margin. The `apply()` function takes the following form: `apply(data, margin, function)`. This data has to be an array or a matrix, and the margin can be either `1` or `2`, where `1` stands for a row-wise operation and `2` stands for a column-wise operation. We will work with the matrix `all_prices`, which has the following structure:

Here, we have a record of prices of three different items in three different months (January, March, and June), where a row represents the prices of an item in three different months and a column represents the prices of three different items in any single month. Now, if we want to know which item's price fluctuated most over these three months, we would have to compute a standard deviation row-wise for each row. We can do this very easily using `margin = 1` in `apply()`.

apply(all_prices, 1, sd)

We can see the standard deviation for these three items as follows:

Now suppose we want to know the month-wise total cost of all three items. As every column corresponds to different months, we can apply `apply()` with `margin = 2` and a function mean to achieve this:

apply(all_prices, 2, sum)

This gives the sum for all three months in a vector:

We see that the total prices were the highest in June (the third column), totaling `78`.

`apply()`has to be without

`()`. We just need to write its name without parentheses.

# lapply

In the previously mentioned `power_function()` function, we had to use a `for` loop to loop through all the values of the `june_price` column of the `all_prices4` data frame. `lapply` allows us to define a function (or use an already existing function) over all the elements of a list or vector and it returns a list. Let's redefine `power_function()` to allow for the computation of different powers on elements and then use `lapply` to loop through each element of a list or vector and take the power of each of these elements on every iteration of the loop. `lapply()` has the following format:

lapply(data, function, arguments_of_the_function) power_function2 = function(data, power){

data^power

}

lapply(all_prices4$june_price, power_function2, 4)

As we saw in the last output, all the prices of `june_price` are taken to the fourth power and are returned as a list:

`unlist()`to get a simple vector for our convenience.

unlist(lapply(all_prices4$june_price, power_function2, 4))

Now we are returned the fourth power of the `june_price` column as a vector.

Now we will again work with a **combined** array, which has the prices of different items in three different months each for 2017 and 2018. Do you remember the structure of it? It looked like this:

Here, the first matrix corresponds to prices for 2017 and the second matrix corresponds to 2018. We will now recreate this array to become a list of matrices in the following way:

combined2 = list(matrix(c(jan_2018, mar_2018, june_2018), nrow = 3),

matrix(c(jan_2017, mar_2017, june_2017), nrow = 3))

combined2

This returns us the following list of matrices:

Now, if we want the prices for March for both 2017 and 2018, we can use `lapply()` in the following way:

lapply(combined2, "[", 2,)

So, what this has done is selected the second row from each list:

Now we can modify it further to select a column, row, or any element according to our needs.

`lapply()`can be used with data frames, lists, and vectors.

# sapply

What we have got by using `unlist(lapply(data, function, arguments_of_the_function))` can be obtained simply by using `sapply(data, function, arguments_of_the_function)`.

sapply(all_prices4$june_price, power_function2, 4)

We are returned with a vector again as follows:

Now let's go back to the example of the `all_prices3` data frame. We can see this from the screenshot that follows:

# tapply

Now, suppose instead of prices for 2018 only, we have prices for these items for 2017, 2016, and 2015 as well. This new data frame is defined as follows:

all_prices = data.frame(items = rep(c("potato", "rice", "oil"), 4),

jan_price = c(10, 20, 30, 10, 18, 25, 9, 17, 24, 9, 19,27),

mar_price = c(11, 22, 33, 13, 25, 32, 12, 21, 33, 15, 27,39),

june_price = c(20, 25, 33, 21, 24, 40, 17, 22, 27, 13, 18,23)

)

all_prices

The output for the preceding lines of code can be seen as follows:

Now suppose we want to take the mean price of different items for very March in all years. We can do this by using `tapply(numerical_variable, categorical_variable, function)`. So, we will need to convert the items column of the `all_prices` data frame to a categorical variable to take the mean price.

tapply(all_prices$mar_price, factor(all_prices$items), mean)

This gives us a mean March price for `oil`, `potato`, and `rice` in all years, as follows:

Note the use of `factor()` to convert the items column to a factor variable.

There are other `apply` functions, but that's it for now, folks. We will introduce new functions as and when it will be necessary as we proceed to new chapters for geospatial analysis.

To install a new package, we need to write `install.packages("package_name")`, and to use any package, we need to write `load.packages("package_name")`.