When we work with any statistical software, such as R, we rarely use single values for an object. We need to know how we can handle a collection of data values (for example, the age of 100 randomly selected diabetic patients), along with what type of objects are needed to store these data values. In R, the most convenient way to store more than one data value is vector
(a collection of data values stored in a single object is known as a vector: for example, storing the ages of 100 diabetic patients in a single object). In fact, whenever we create an R object, it stores the values as a vector. It could be a single-element vector or a multiple-element vector. The num.obj
vector we created in the previous section is a kind of vector that comprises numeric elements.
One of the simplest ways to create a vector in R is to use the c()
function. Here is an example:
# creating vector of numeric element with "c" function > num.vec <- c(1,3,5,7) > num.vec [1] 1 3 5 7 > mode(num.vec) [1] "numeric" > class(num.vec) [1] "numeric" > is.vector(num.vec) [1] TRUE
If we create a vector with mixed elements (character and numeric), the resulting vector will be a character vector. Here is an example:
# Vector with mixed elements > num.char.vec <- c(1,3,"five",7) > num.char.vec [1] "1" "3" "five" "7" > mode(num.char.vec) [1] "character" > class(num.char.vec) [1] "character" > is.vector(num.char.vec) [1] TRUE
We can create a big new vector by combining multiple vectors, and the resulting vector's mode will be character, if any element of any vector contains a character. The vector can be named, or it can be without a name. In the previous example, vectors were without names.
The following example shows how we can create a vector with the name of each element:
# combining multiple vectors > comb.vec <- c(num.vec,num.char.vec) > mode(comb.vec) [1] "character" # creating named vector > named.num.vec <- c(x1=1,x2=3,x3=5) > named.num.vec x1 x2 x3 1 3 5
The name of the elements in a vector can be assigned separately using the names()
command. In R, any single constant is also stored as a vector of the single element.
Here is an example:
# vector of single element > unit.vec <- 9 > is.vector(unit.vec) [1] TRUE
R has six basic storage types of vectors, and each type is known as an atomic vector.
The following table shows the six basic vector types, their mode, and the storage mode:
Type |
Mode |
Storage mode |
---|---|---|
logical |
logical |
logical |
integer |
numeric |
integer |
double |
numeric |
double |
complex |
complex |
complex |
character |
character |
character |
raw |
raw |
raw |
Other than vectors, there are different storage types available in R to handle data with multiple elements; these are matrix, data frame, and list. We will discuss each of these types in subsequent sections.
To convert the object mode, R has user-friendly functions that can be depicted as as.x
. Here, x
could be numeric, logical, character, list, data frame, and so on. For example, if we need to perform a matrix operation that requires numeric mode, and the data is stored in some other mode, the operation cannot be performed. In this case, we need to convert that data into numeric mode.
In the following example, we will see how we can convert an object's mode:
# creating a vector of numbers and then converting it to logical # and character > numbers.vec <- c(-3,-2,-1,0,1,2,3) > numbers.vec [1] -3 -2 -1 0 1 2 3 > num2char <- as.character(numbers.vec) > num2char [1] "-3" "-2" "-1" "0" "1" "2" "3" > num2logical <- as.logical(numbers.vec) > num2logical [1] TRUE TRUE TRUE FALSE TRUE TRUE TRUE # creating character vector and then convert it to numeric and logical > char.vec <- c("1","3","five","7") > char.vec [1] "1" "3" "five" "7" > char2num <- as.numeric(char.vec) Warning message: NAs introduced by coercion > char2num [1] 1 3 NA 7 > char2logical <- as.logical(char.vec) > char2logical [1] NA NA NA NA # logical to character conversion > logical.vec <- c(TRUE, FALSE, FALSE, TRUE, TRUE) > logical.vec [1] TRUE FALSE FALSE TRUE TRUE > logical2char <- as.character(logical.vec) > logical2char [1] "TRUE" "FALSE" "FALSE" "TRUE" "TRUE"
Note that, when we convert numeric mode to logical mode, only 0
(zero) gets FALSE
, and all the other values get TRUE
. Also, if we convert a character object to numeric, it produces numeric elements and NA
(if any actual character is present), where a warning will be issued. Importantly, R does not convert a character object into a logical object but, if we try to do this, all the resulting elements will be NA
. However, logical objects get successfully converted to character objects.
Finally, we can say that any object can be converted to a character without offering any warning. However, if we want to convert character objects to any other type, we have to be careful.
R is a domain-specific programming language, specially designed to perform statistical analysis on data. In statistics, when we analyze data, the first thing that comes to mind is a variable with hundreds of observations in it. This reminds us of the picture of a vector. Probably, this is the main reason why, in R, the most elementary data type is a vector. A vector is a contiguous cell that contains data, where each cell can be accessed by an index:
> age <- c(10,20,30,40)
This is an example of a vector. The age of five individuals is stored in the age vector. Pay attention to how the vector was formed and stored under the age
variable. Here, c()
is a function used to create a vector, but this does not store all the data in the system. <-
is called an assignment operator that is used to store a vector under a variable.
Now, in the console, let's type the following line and press Enter:
> age [1] 10 20 30 40
We successfully stored all the ages under the age
variable, but what is [1]
? This means that the index of the value 10
is 1
.
If you want to see the first values of the vector, type the following command:
> age[3] [1] 30
Why did R only show the index of the first value and not the other values? This is only to keep the output clean and informative. Every time R writes a new line, it first gives the index number of the next value. Pretty soon, you will be familiar with this convention. We can store a single value under a variable, but it will be a vector with one element:
> height<- 175
To show you that height is not a scalar but a vector with one element, we will store one additional value in it:
> height[2]<- 180
Pay attention to how we added another value inside an existing vector. Here, we put 180 in the second cell of the vector. Can you recall how we accessed the value in the second cell for the age variable? Using age[2]
, right? Similarly, we can assign a value to the second cell of the vector using the same syntax. Let's try to put another value inside the height variable:
> height[3] <- 165
Now, we can see all the values stored inside the height variable:
> height [1] 175 180 165
Although the basic data structure in R is vectors, there can be different types of vector. We use a numeric vector to store numeric data such as age, height, weight, and so on. Character vectors are used to store string data such as name, address, and so on. The way we can define a character vector in R is simple:
> name<- c("Rob", "Bob", "Jude","Monica")
When we want to store a character in R, we need to use double quotes, as used in the previous example. This tells R that this is a string input. We can put numeric values using double quotes but, if we use a character without double quotes, then it will return an error message.
Another special type of vector is the logical vector. There are two ways we could define a logical vector; first, we will show you the more formal way and, second, we will show you the quick way. There can be two possible elements in a logical
vector: TRUE
and FALSE
. This logical
vector is used in logical
operations in R. It can be used to select specific rows from a dataset.
We can define a logical
vector in the following way:
> logical<- c(TRUE, FALSE, TRUE, FALSE)
This logical
vector can be used as a row selector of the age
vector in the following way:
> age[logical] [1] 10 30
Look closely to find out what we just did. We have seen how we can extract age from a vector using indexing. A logical
vector can be thought of as a vector of an index. The first element of the logical
vector is TRUE
, which means that the first element of the age
vector will be selected. The second element of the logical
vector is FALSE
. This means that the second element of the age
vector will not be selected. So, the logical
vector will select only the elements of the age
vector for which the logical
vector is TRUE
. So, finally, two elements of the age
vector will be selected, and a vector of two elements will be returned. A question that may come to your mind is, What can we do with this feature? The answer will be clearer in the Data frame section.