Book Image

Machine Learning with R Cookbook, Second Edition - Second Edition

By : Yu-Wei, Chiu (David Chiu)
Book Image

Machine Learning with R Cookbook, Second Edition - Second Edition

By: Yu-Wei, Chiu (David Chiu)

Overview of this book

Big data has become a popular buzzword across many industries. An increasing number of people have been exposed to the term and are looking at how to leverage big data in their own businesses, to improve sales and profitability. However, collecting, aggregating, and visualizing data is just one part of the equation. Being able to extract useful information from data is another task, and a much more challenging one. Machine Learning with R Cookbook, Second Edition uses a practical approach to teach you how to perform machine learning with R. Each chapter is divided into several simple recipes. Through the step-by-step instructions provided in each recipe, you will be able to construct a predictive model by using a variety of machine learning packages. In this book, you will first learn to set up the R environment and use simple R commands to explore data. The next topic covers how to perform statistical analysis with machine learning analysis and assess created models, covered in detail later on in the book. You'll also learn how to integrate R and Hadoop to create a big data analysis platform. The detailed illustrations provide all the information required to start applying machine learning to individual projects. With Machine Learning with R Cookbook, machine learning has never been easier.
Table of Contents (21 chapters)
Title Page
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Customer Feedback
Preface

Understanding of basic data structures


Ensure you have completed the previous recipes by installing R on your operating system.

Data types

You need to have brief idea about basic data types and structures in R in order to grasp all the recipies in book. This section will give you an overview for the same and make you ready for using R. R supports all the basic data types supported by any other programming and scripting language. In simple words, data can be of numeric, character, date, and logical type. As the name suggests, numeric means all type of numbers, while logical allows only true and false. To check the type of data, the class function, which will display the class of the data, is used.

Perform following task on R Console or RStudio:

> x=123 
> class(x) 
Output: 
[1] "numeric"
> x="ABC"
> class(x)
Output:
[1] "character"

Data structures

R supports different types of data structures to store and process data. The following is a list of basic and commonly used data structures used in R:

  • Vectors
  • List
  • Array
  • Matrix
  • DataFrames

Vectors

A vector is a container that stores data of same type. It can be thought of as a traditional array in programming language. It is not to be confused with mathematical vector which have rows and columns. To create a vector the c() function, which will combine the arguments, is used. One of the beautiful features of vectors is that any operation performed on vector is performed on each element of the vector. If a vector consists of three elements, adding two will increases every element by two.

How to do it...

Perform the following steps to create and see vector in R:

> x=c(1,2,3,4) # c(1:4) 
> x 
Output: 
[1] 1 2 3 4 
> x=c(1,2,3,4,"ABC") 
> x 
Output: 
[1] "1" "2" "3" "4" "ABC" 
> x * 2 
Output: 
[1] 2 4 6 8 
> sqrt(x) 
Output: 
[1] 1.000000 1.414214 1.732051 2.000000 
> y = x==2 
> y 
Output: 
[1] FALSE TRUE FALSE FALSE 
> class(y) 
Output: 
[1] "logical" 
> t = c(1:10) 
> t 
Output: 
[1] 1 2 3 4 5 6 7 8 9 10

How it works...

Printing a vector will starts with index [1] which shows the elements are indexed in vector and it starts from 1, not from 0 like other languages. Any operation done on a vector is applied on individual elements of the vector, so the multiplication operation is applied on individual elements of the vector. If vector is passed as an argument to any inbuilt function, it will be applied on individual elements. You can see how powerful it is and it removes the need to write the loops for doing the operation. The vector changes the type on basis of data it holds and operation we apply on it. Using x==2 will check each element of vector for equality with two and returns the vector with logical value, that is, TRUE or FALSE. There are many other ways of creating a vector; one such way is shown in creating vector t.

Lists

Unlike a vector, a list can store any type of data. A list is, again, a container that can store arbitrary data. A list can contain another list, a vector, or any other data structure. To create a list, the list function is used.

How to do it...

Perform the following steps to create and see a list in R:

> y = list(1,2,3,"ABC") 
> y 
Output: 
[[1]]
[1]1
[[2]]
[1]2
[[3]]
[1]3
[[4]]
[1] "ABC" 
> y = list(c(1,2,3),c("A","B","C")) 
> y 
Output: 
[[1]]
[1] 1 2 3
[[2]]
[1] "A" "B" "C"

How it works...

A list, as said, can contain anything; we start with a simple example to store some elements in a list using the list function. In the next step, we create a list with a vector as element of the list. So, y is a list with its first element as vector of 1, 2, 3 and its second element as vector of A, B, and C.

Array

An array is nothing but a multidimensional vector, and can store only the same type of data. The way to create a multidimensional vector dimension is specified using dim.

How to do it...

Perform the following steps to create and see an array in R:

> t = array(c(1,2,3,4), dim=c(2,2)) # Create two dimensional array 
> t 
Output: 
    [,1]       [,2]
[1,] 1          3
[2,] 2          4 
> arr = array(c(1:24), dim=c(3,4,2)) # Creating three dimensional array 
> arr 
Output: 
, , 1   

      [,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    2    5    8   11
[3,]    3    6    9   12
, , 2 
      [,1] [,2] [,3] [,4]
[1,]   13   16   19   22
[2,]   14   17   20   23 
[3,]   15   18   21   24 

How it works...

Creating an array is straightforward. Use the array function and provide the value for nth row; it will create a two-dimensional array with appropriate columns.

Matrix

A matrix is like a DataFrame, with the constraint that every element must be of the same type.

How to do it...

Perform the following steps to create and see a matrix in R:

> m = matrix(c(1,2,3,4,5,6), nrow=3) 
> m 
Output: 
[,1]       [,2]
[1,] 1      4
[2,] 2      5
[3,] 3      6

DataFrame

DataFrame can be seen as an Excel spreadsheet, with rows and columns where every column can have different data types. In R, each column of a DataFrame is a vector.

How to do it...

Perform the following steps:

    > p = c(1,2,3)
    > q = c("A","B","C")
    > r = c(TRUE, FALSE, FALSE)
    > d = data.frame(No=p, Name=q, Attendance=r)
    > d
    Output:
           No       Name   Attendance
    1       1          A      TRUE
    2       2          B      FALSE
    3       3          C      FALSE