Book Image

Machine Learning with R Cookbook

By : Yu-Wei, Chiu (David Chiu)
Book Image

Machine Learning with R Cookbook

By: Yu-Wei, Chiu (David Chiu)

Overview of this book

<p>The R language is a powerful open source functional programming language. At its core, R is a statistical programming language that provides impressive tools to analyze data and create high-level graphics.</p> <p>This book covers the basics of R by setting up a user-friendly programming environment and performing data ETL in R. Data exploration examples are provided that demonstrate how powerful data visualization and machine learning is in discovering hidden relationships. You will then dive into important machine learning topics, including data classification, regression, clustering, association rule mining, and dimension reduction.</p>
Table of Contents (21 chapters)
Machine Learning with R Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Resources for R and Machine Learning
Dataset – Survival of Passengers on the Titanic
Index

Performing cross-validation with the e1071 package


Besides implementing a loop function to perform the k-fold cross-validation, you can use the tuning function (for example, tune.nnet, tune.randomForest, tune.rpart, tune.svm, and tune.knn.) within the e1071 package to obtain the minimum error value. In this recipe, we will illustrate how to use tune.svm to perform the 10-fold cross-validation and obtain the optimum classification model.

Getting ready

In this recipe, we continue to use the telecom churn dataset as the input data source to perform 10-fold cross-validation.

How to do it...

Perform the following steps to retrieve the minimum estimation error using cross-validation:

  1. Apply tune.svm on the training dataset, trainset, with the 10-fold cross-validation as the tuning control. (If you find an error message, such as could not find function predict.func, please clear the workspace, restart the R session and reload the e1071 library again):

    > tuned = tune.svm(churn~., data = trainset, gamma = 10^-2, cost = 10^2, tunecontrol=tune.control(cross=10))
    
  2. Next, you can obtain the summary information of the model, tuned:

    > summary(tuned)
    
    Error estimation of 'svm' using 10-fold cross validation: 0.08164651
    
  3. Then, you can access the performance details of the tuned model:

    > tuned$performances
      gamma cost      error dispersion
    1  0.01  100 0.08164651 0.02437228
    
  4. Lastly, you can use the optimum model to generate a classification table:

    > svmfit = tuned$best.model
    > table(trainset[,c("churn")], predict(svmfit))
         
           yes   no
      yes  234  108
      no    13 1960
    

How it works...

The e1071 package provides miscellaneous functions to build and assess models, therefore, you do not need to reinvent the wheel to evaluate a fitted model. In this recipe, we use the tune.svm function to tune the svm model with the given formula, dataset, gamma, cost, and control functions. Within the tune.control options, we configure the option as cross=10, which performs a 10-fold cross validation during the tuning process. The tuning process will eventually return the minimum estimation error, performance detail, and the best model during the tuning process. Therefore, we can obtain the performance measures of the tuning and further use the optimum model to generate a classification table.

See also

  • In the e1071 package, the tune function uses a grid search to tune parameters. For those interested in other tuning functions, use the help function to view the tune document:

    > ?e1071::tune