Book Image

R Statistics Cookbook

By : Francisco Juretig
2 (2)
Book Image

R Statistics Cookbook

2 (2)
By: Francisco Juretig

Overview of this book

R is a popular programming language for developing statistical software. This book will be a useful guide to solving common and not-so-common challenges in statistics. With this book, you'll be equipped to confidently perform essential statistical procedures across your organization with the help of cutting-edge statistical tools. You'll start by implementing data modeling, data analysis, and machine learning to solve real-world problems. You'll then understand how to work with nonparametric methods, mixed effects models, and hidden Markov models. This book contains recipes that will guide you in performing univariate and multivariate hypothesis tests, several regression techniques, and using robust techniques to minimize the impact of outliers in data.You'll also learn how to use the caret package for performing machine learning in R. Furthermore, this book will help you understand how to interpret charts and plots to get insights for better decision making. By the end of this book, you will be able to apply your skills to statistical computations using R 3.5. You will also become well-versed with a wide array of statistical techniques in R that are extensively used in the data science industry.
Table of Contents (12 chapters)

Using R6 classes

Object-oriented programming allows us to organize our code in classes, encapsulating similar functionality together, and also allowing us clearly to separate internal from external methods. For example, we can design a class that has a method for reading data from a file, another method for removing outliers, and another one for selecting a subset of the columns. We can decide to keep all of these methods as public, meaning that we can access them from outside the class definition.

R supports object-oriented programming via S3 and S4 classes. The R6Class package, allows us to use R6 classes. These allow us to define our own classes in R in a very easy way. They also support inheritance, meaning that we can define a parent class and several derived classes that inherit from it. This implies that the derived classes can access all the methods and attributes from the parent class.

The central advantage of using inheritance is its simplification of the code (thus avoiding the duplication of functions). Also, using inheritance generates a structure in our code (where classes are connected via base/parent classes), which makes our code easier to read.

Getting ready

In order to run this example, we need to install the R6 package. It can be installed using install.packages("R6")

How to do it...

We will load data from a .csv file containing records customers, and we will instantiate a new class instance for each record. These records will be added to a list.

  1. Import the R6 library:
library(R6)
  1. Load the data from a .csv file:
customers = read.csv("./Customers_data.csv")
  1. We will now begin defining the R6Class structure. Note that we have two lists, one for the public attributes or methods, and another one for the private (these methods or attributes can only be accessed by other methods from this class). The initialize method is called whenever we create a new instance of this class. Note that we refer to the internal elements from this class using the self$ notation:
Customer = R6Class(public=list(Customer_id = NULL,Name = NULL,City = NULL,
initialize = function(customer_id,name,city,Missing_product,Missing_since){
self$Customer_id <- customer_id
self$Name <- name
self$City <- city
},
is_city_in_america = function(){
return (upper_(self$City) %in% c("NEW YORK","LONDON","MIAMI","BARCELONA"))
},
full_print = function(){
print("------------------------------------")
print(paste("Customer name ->",self$Name))
print(paste("Customer city ->",self$City))
print("------------------------------------")
}
),private=list(
upper_ = function(x){
return (toupper(x))
}
))
  1. We loop through our DataFrame and create a new Customer instance, passing three arguments. These are passed to the initialize method that we defined previously:
list_of_customers = list()
for (row in 1:nrow(customers)){
row_read = customers[row,]
customer = Customer$new(row_read$Customer_id,row_read$Name,row_read$City)
list_of_customers[[row]] <- (customer)
}
  1. We call our print method:
list_of_customers[[1]]$full_print()

The following screenshot prints the customer name and city:

How it works...

Let's assume we want to process clients' data from a CSV file. The R6 classes support public and private components. Each one of them will be defined as a list containing both methods or attributes. For example, we will store the customer_id, the name, and the city as public attributes. We need to initialize them to NULL. We also need an initialize method that will be called whenever the class is instantiated. This is the equivalent of a constructor in other programming languages. Inside the initializer or constructor, we typically want to store the variables provided by the user. We need to use the self keyword to refer to the class variables. We then define a method that will return either TRUE or FALSE if the city belongs is in America or not. Another method, called full_print(), will print the contents of the class.

The lock_objects method is not usually very important; it indicates whether we want to lock the elements in the class. If we set lock=FALSE, that means that we can add more attributes later, if we want to.

Here, we only have one private method. Since it is private, it can only be called within the class, but not externally. This method, called upper_, will be used to transform the text into uppercase.

After the class is defined, we loop through the DataFrame and select each row sequentially. We instantiate the class for each row, and then we add each one of these into a list.

The convenience of using classes is that we now have a list containing each instance. We can call the specific methods or attributes for each element in this list. For example, we can get a specific element and then call the is_city_in_america method; and finally we call the full_print method.

There's more...

The R6 package also supports inheritance, meaning that we can define a base class (that will act as a parent), and a derived class (that will act as a child). The derived class will be able to access all the methods and attributes defined in the parent class, reducing code duplication, and simplifying its maintainability. In this example, we will create a derived class called Customer_missprod, which will store data for clients who haven't yet received a product they were expecting. Note that the way we achieve this is by using the inherit parameter.

Note that we are overriding the full_print method, and we are printing some extra variables. It is important to understand the difference between the super and self methods—the former is used to refer to attributes or methods present in the base class. We evidently need to override the constructor (already defined in the base class) because we have more variables now:

library(R6)
customers = read.csv("./Customers_data_missing_products.csv")
Customer_missprod = R6Class(inherit = Customer,
public=list(Missing_prod = NULL,Missing_since = NULL,
initialize = function(customer_id,name,city,Missing_product,Missing_since){
super$Customer_id <- customer_id
super$Name <- name
super$City <- city
self$Missing_prod <- Missing_product
self$Missing_since <- Missing_since
},
full_print = function(){
print("------------------------------------")
print(paste("Customer name ->",super$Name))
print(paste("Customer city ->",super$City))
print(paste("Missing prod ->",self$Missing_prod))
print(paste("Missing since ->",self$Missing_since))
print("------------------------------------")
}
)
)

list_of_customers = list()
for (row in 1:nrow(customers)){
row_read = customers[row,]
customer = Customer_missprod$new(row_read$Customer_id,row_read$Name,row_read$City,row_read$Missing_product,row_read$Missing_since)
list_of_customers[[row]] <- (customer)
}

list_of_customers[[1]]$full_print()

Take a look at the following screenshot: