Book Image

Mastering Machine Learning with R

By : Cory Lesmeister
Book Image

Mastering Machine Learning with R

By: Cory Lesmeister

Overview of this book

Table of Contents (20 chapters)
Mastering Machine Learning with R
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Data frames and matrices


We will now create a data frame, which is a collection of variables (vectors). We will create a vector of 1, 2, and 3 and another vector of 1, 1.5, and 2.0. Once this is done, the rbind() function will allow us to combine the rows:

> p = seq(1:3)

> p
[1] 1 2 3

> q = seq(1,2, by=0.5)

> q
[1] 1.0 1.5 2.0

> r = rbind(p,q)

> r
  [,1] [,2] [,3]
p    1  2.0    3
q    1  1.5    2

The result is a list of two rows with three values each. You can always determine the structure of your data using the str() function, which in this case, shows us that we have two lists, one named p and the other, q:

> str(r)
 num [1:2, 1:3] 1 1 2 1.5 3 2
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:2] "p" "q"
  ..$ : NULL

Now, let's put them together as columns using cbind():

> s = cbind(p,q)

> s
     p   q
[1,] 1 1.0
[2,] 2 1.5
[3,] 3 2.0

To put this in a data frame, use the as.data.frame() function. After that, examine the structure:

> s = as.data.frame(s)

> str(s)
'data.frame':3 obs. of  2 variables:
 $ p: num  1 2 3
 $ q: num  1 1.5 2

We now have a data frame, (s), that has two variables of three observations each. We can change the names of the variables using names():

> names(s) = c("column 1", "column 2")

> s
  column 1 column 2
1        1      1.0
2        2      1.5
3        3      2.0

Let's have a go at putting this into a matrix format with as.matrix(). In some packages, R will require the analysis to be done on a data frame, but in others, it will require a matrix. You can switch back and forth between a data frame and matrix as you require:

> t= as.matrix(s)

> t
     column 1 column 2
[1,]        1      1.0
[2,]        2      1.5
[3,]        3      2.0

One of the things that you can do is check whether a specific value is in a matrix or data frame. For instance, we want to know the value of the first observation and first variable. In this case, we will need to specify the first row and first column in brackets as follows:

> t[1,1]
column 1 
       1

Let's assume that you want to see all the values in the second variable (column). Then, just leave the row blank but remember to use a comma before the column(s) that you want to see:

> t[,2]
[1] 1.0 1.5 2.0

Conversely, let's say we want to look at the first two rows only. In this case, just use a colon symbol:

> t[1:2,]
     column 1 column 2
[1,]        1      1.0
[2,]        2      1.5

Assume that you have a data frame or matrix with 100 observations and ten variables and you want to create a subset of the first 70 observations and variables 1, 3, 7, 8, 9, and 10. What would this look like?

Well, using the colon, comma, concatenate function, and brackets you could simply do the following:

> new = old[1:70, c(1,3,7:10)]

Notice how you can easily manipulate what observations and variables you want to include. You can also easily exclude variables. Say that we just want to exclude the first variable; then you could do the following using a negative sign for the first variable:

> new = old[,-1]

This syntax is very powerful in R for the fundamental manipulation of data. In the main chapters, we will also bring in more advanced data manipulation techniques.