Book Image

Learning Predictive Analytics with R

By : Eric Mayor
Book Image

Learning Predictive Analytics with R

By: Eric Mayor

Overview of this book

This book is packed with easy-to-follow guidelines that explain the workings of the many key data mining tools of R, which are used to discover knowledge from your data. You will learn how to perform key predictive analytics tasks using R, such as train and test predictive models for classification and regression tasks, score new data sets and so on. All chapters will guide you in acquiring the skills in a practical way. Most chapters also include a theoretical introduction that will sharpen your understanding of the subject matter and invite you to go further. The book familiarizes you with the most common data mining tools of R, such as k-means, hierarchical regression, linear regression, association rules, principal component analysis, multilevel modeling, k-NN, Naïve Bayes, decision trees, and text mining. It also provides a description of visualization techniques using the basic visualization tools of R as well as lattice for visualizing patterns in data organized in groups. This book is invaluable for anyone fascinated by the data mining opportunities offered by GNU R and its packages.
Table of Contents (23 chapters)
Learning Predictive Analytics with R
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Exercises and Solutions
Index

The menu bar of the R console


When the R console window is active, there are seven accessible menus: File, Edit, View, Misc, Packages, Windows, and Help. If you use a platform other than Windows 7, you might notice some differences, but none are important.

Some functions of the File and Misc menus are worth commenting upon briefly. Functions from the Packages menu will be commented upon in the next section. Function is a term that can loosely relate to something the program does, or more specifically, a succession of steps programmatically defined, oftentimes involving an algorithm, and explicitly called by some piece of code. When discussing functions accessed through a menu, we will indicate the name of the menu item. When discussing functions as they appear in code, we will indicate the function name followed by brackets (). Sometimes, a function selectable from the menu corresponds to a single function in code; other times, several lines of code are necessary to accomplish the same thing as the menu function through code.

A quick look at the File menu

The File menu contains functions related to file handling. Some useful functions of the File menu are as follows:

  • Source R code: Opens a dialogue box from which an R script can be selected. This script will be run in the console.

  • New script: Opens a new window of the R editor, in which R code can be typed or pasted. When this window is active, the menu bar changes.

  • Open script: Opens a dialogue box from which an R script can be selected. This script will be loaded in a new window of the R editor.

  • Change dir: Opens a dialogue window where a folder can be selected. This folder will become the working folder for the current session (until changed).

Here are some quick exercises that will help you get acquainted with the File menu. Before this, make sure that you have downloaded and extracted the code for this book from its webpage.

Let's start by changing the working folder to the folder where you extracted this book's code. This can be done using the Change dir function. Simply click on it in the File menu and select the folder you wish to use.

Now, open the R script file called helloworld.R; this can be done using the Source R code function. The file should be listed in the dialogue box. If this is not the case, start by selecting the folder containing the R code again. The file contains the following code:

Tip

Downloading the example code

You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

print("Hello world")

This line of code calls the print() function with the argument "Hello world".

Experiment running the first line of R code: select the content of the file, right click on it, and click on Run line or selection.

Alternatively you can simply hit Ctrl + R after having selected the line of code. As you might have guessed, the function returns as an output in the Console window:

[1] "Hello world"

Let's imagine you want to create a new script file that prints Hi again, world when run. This can be done by clicking on New script in the File menu and typing the following:

print("Hi again, world")

Now save this file as hiagainworld.R in the working folder. Use the Save function from the File menu of the R editor (not the console).

Tip

This book will not cover all functions in detail. If you want to know more about a function, simply precede its name by a question mark, for instance, ?print().

A quick look at the Misc menu

The Misc menu contains functions that are related to various aspects not otherwise classified as a menu in the RGui. Some useful functions of the Misc menu are as follows:

  • Stop current computation and Stop all computations: When handling big datasets and computationally exigent algorithms, R may take longer than expected to complete the tasks. If for any reason, the console is needed during this time, the computations can be stopped by using this function.

  • List objects: Pastes and runs the ls() function in the console. This outputs the list of objects in the current workspace.

  • List search path: Pastes and runs the search() function in the console. This outputs the list of accessible packages. We will discuss this feature in the next section.

Try exercising these functions of the Misc menu:

Enter the following code in console:

repeat(a = 1)

This code will cause R to enter an infinite loop because the repeat statement continually runs the assignment a = 1 in the code block, that is, what is contained between the parentheses (). This means that R will become unavailable for further computation. In order to give R some rest, we will now exit this loop by stopping the computation. In order to do this, select Stop current computation from the Misc menu. You can alternatively just press the Esc key to obtain the same result.

After doing the exercise above, get to know which objects are in the current workspace. In order to do this, simply click on List objects. The output should be as follows:

[1] "a"

Each time we create a variable, vector, list, matrix, data frame, or any other object, it will be accessible for the current session and visible using the ls() function.

Let's seize the opportunity to discuss some types of R objects and how to access their components:

  • We call variable an object containing a single piece of information (such as the a object above).

  • A vector is a group of indexed components of the same type (for instance, numbers, factors, and Booleans). Elements of vectors can be accessed using their index number between square brackets, [ ]. The following will create a vector b of three components, by using the c() function (for concatenate):

    b = c(1,2,3)

    The second element of vector b is accessed as follows:

    b[2]
  • We call attribute a vector that is related to a measurement across observations in a dataset (for example, the heights of different individuals stored in a vector is an attribute).

  • A list is a special type of vector that contains other vectors, or even matrices. Not all components of a list need to be of the same type. The following code will create a list called c containing a copy of variable a and vector b:

    c = list(a,b)

    We use double brackets [[ ]], to access the components of a list. The copy of the a object stored in the list c that we just created can be accessed as follows:

    c[[1]]

    Accessing the first element of the copy of vector b stored in list c can be done as follows:

    c[[2]][1]
  • A matrix can only contain elements of the same type. These are arranged in rows and columns. The following will create a 3 × 2 matrix of numbers (numbers 1 to 6), with odd numbers in the first column.

    M = matrix(c(1,2,3,4,5,6), nrow = 3, ncol = 2)

    The first row of the matrix can be accessed as follows:

    M[1,]

    The second column of the matrix can be accessed as follows:

    M[,2]

    The second element of the first column of the matrix can be accessed as follows:

    M[2,1]
  • A dataframe is a list of vectors that have the same dimensions, analogous to a spreadsheet. The following will create a data frame containing two vectors. The first contains the letters a, b, and c. The second contains the numbers 1, 2, and 3.

    f = data.frame(c("a","b","c"),c(1,2,3))

    The first vector of data frame f can be accessed as follows:

    f[,1]

    This actually subsets the entire row of the first vector of the data frame. (Notice we did not have to use the double brackets notation here, but sometimes, this is necessary, depending on how the data frame has been generated.) When dealing with data frames (but not matrices), the comma can be omitted, meaning that the following is equivalent:

    f[1]

    The first element of the second vector of the data frame f (the element corresponding to the intersection of the first row and the second column of the data frame) can be accessed as follows:

    f[1,2]

    Subsetting can be more complex. For instance, the following code returns the second and the third rows of the first column of the data frame (note that matrices are subset in a similar manner):

    f[2:3,1]