Book Image

Hands-On Ensemble Learning with R

By : Prabhanjan Narayanachar Tattar
Book Image

Hands-On Ensemble Learning with R

By: Prabhanjan Narayanachar Tattar

Overview of this book

Ensemble techniques are used for combining two or more similar or dissimilar machine learning algorithms to create a stronger model. Such a model delivers superior prediction power and can give your datasets a boost in accuracy. Hands-On Ensemble Learning with R begins with the important statistical resampling methods. You will then walk through the central trilogy of ensemble techniques – bagging, random forest, and boosting – then you'll learn how they can be used to provide greater accuracy on large datasets using popular R packages. You will learn how to combine model predictions using different machine learning algorithms to build ensemble models. In addition to this, you will explore how to improve the performance of your ensemble models. By the end of this book, you will have learned how machine learning algorithms can be combined to reduce common problems and build simple efficient ensemble models with the help of real-world examples.
Table of Contents (17 chapters)
Hands-On Ensemble Learning with R
Contributors
Preface
12
What's Next?
Index

R package references


Prabhanjan Tattar (2015). ACSWR: A Companion Package for the Book "A

Course in Statistics with R". R package version 1.0.

https://CRAN.R-project.org/package=ACSWR

Alfaro, E., Gamez, M. Garcia, N.(2013). adabag: An R Package for

Classification with Boosting and Bagging. Journal of Statistical

Software, 54(2), 1-35. URL http://www.jstatsoft.org/v54/i02/.

Angelo Canty and Brian Ripley (2017). boot: Bootstrap R (S-Plus)

Functions. R package version 1.3-19.

John Fox and Sanford Weisberg (2011). An {R} Companion to Applied

Regression, Second Edition. Thousand Oaks CA: Sage. URL:

http://socserv.socsci.mcmaster.ca/jfox/Books/Companion car

Max Kuhn. Contributions from Jed Wing, Steve Weston, Andre Williams,

Chris Keefer, Allan Engelhardt, Tony Cooper, Zachary Mayer, Brenton

Kenkel, the R Core Team, Michael Benesty, Reynald Lescarbeau, Andrew

Ziem, Luca Scrucca, Yuan Tang, Can Candan and Tyler Hunt. (2017).

caret: Classification and Regression Training. R package version

6.0-77. https://CRAN.R-project.org/package=caret

Zachary A. Deane-Mayer and Jared E. Knowles (2016). caretEnsemble:

Ensembles of Caret Models. R package version 2.0.0.

https://CRAN.R-project.org/package=caretEnsemble Venables, W. N. & Ripley, B. D. (2002) Modern Applied Statistics

with S. Fourth Edition. Springer, New York. ISBN 0-387-95457-0 class

Marie Chavent, Vanessa Kuentz, Benoit Liquet and Jerome Saracco

(2017). ClustOfVar: Clustering of Variables. R package version 1.1.

https://CRAN.R-project.org/package=ClustOfVar David Meyer, Evgenia Dimitriadou, Kurt Hornik, Andreas Weingessel

and Friedrich Leisch (2017). e1071: Misc Functions of the Department

of Statistics, Probability Theory Group (Formerly: E1071), TU Wien.

R package version 1.6-8. https://CRAN.R-project.org/package=e1071 Alboukadel Kassambara and Fabian Mundt (2017). factoextra: Extract

and Visualize the Results of Multivariate Data Analyses. R package

version 1.0.5. https://CRAN.R-project.org/package=factoextra

Sebastien Le, Julie Josse, Francois Husson (2008). FactoMineR: An R

Package for Multivariate Analysis. Journal of Statistical Software,

25(1), 1-18. 10.18637/jss.v025.i01

Alina Beygelzimer, Sham Kakadet, John Langford, Sunil Arya, David

Mount and Shengqiao Li (2013). FNN: Fast Nearest Neighbor Search

Algorithms and Applications. R package version 1.1.

https://CRAN.R-project.org/package=FNN

Hyndman RJ (2017). _forecast: Forecasting functions for time series

and linear models_. R package version 8.2, <URL:

http://pkg.robjhyndman.com/forecast>.

David Shaub and Peter Ellis (2018). forecastHybrid: Convenient

Functions for Ensemble Time Series Forecasts. R package version

2.0.10. https://CRAN.R-project.org/package=forecastHybrid

Greg Ridgeway with contributions from others (2017). gbm:

Generalized Boosted Regression Models. R package version 2.1.3.

https://CRAN.R-project.org/package=gbm

Vincent J Carey. Ported to R by Thomas Lumley and Brian Ripley. Note

that maintainers are not available to give advice on using a package

they did not author. (2015). gee: Generalized Estimation Equation

Solver. R package version 4.13-19.

https://CRAN.R-project.org/package=gee

The H2O.ai team (2017). h2o: R Interface for H2O. R package version

3.16.0.2. https://CRAN.R-project.org/package=h2o

Andrea Peters and Torsten Hothorn (2017). ipred: Improved

Predictors. R package version 0.9-6.

https://CRAN.R-project.org/package=ipred

Alexandros Karatzoglou, Alex Smola, Kurt Hornik, Achim Zeileis

(2004). kernlab - An S4 Package for Kernel Methods in R. Journal of

Statistical Software 11(9), 1-20. URL

http://www.jstatsoft.org/v11/i09/

Friedrich Leisch & Evgenia Dimitriadou (2010). mlbench: Machine

Learning Benchmark Problems. R package version 2.1-1.

Daniel J. Stekhoven (2013). missForest: Nonparametric Missing Value

Imputation using Random Forest. R package version 1.4.

Alan Genz, Frank Bretz, Tetsuhisa Miwa, Xuefei Mi, Friedrich Leisch,

Fabian Scheipl, Torsten Hothorn (2017). mvtnorm: Multivariate Normal

and t Distributions. R package version 1.0-6. URL

http://CRAN.R-project.org/package=mvtnorm

Beck M (2016). _NeuralNetTools: Visualization and Analysis Tools for

Neural Networks_. R package version 1.5.0, <URL:

https://CRAN.R-project.org/package=NeuralNetTools>.

Venables, W. N. & Ripley, B. D. (2002) Modern Applied Statistics

with S. Fourth Edition. Springer, New York. ISBN 0-387-95457-0 nnet

Michael P. Fay, Pamela A. Shaw (2010). Exact and Asymptotic Weighted

Logrank Tests for Interval Censored Data: The interval R Package.

Journal of Statistical Software, 36(2), 1-34. URL

http://www.jstatsoft.org/v36/i02/. perm

Hadley Wickham (2011). The Split-Apply-Combine Strategy for Data

Analysis. Journal of Statistical Software, 40(1), 1-29. URL

http://www.jstatsoft.org/v40/i01/. plyr

Xavier Robin, Natacha Turck, Alexandre Hainard, Natalia Tiberti,

Frédérique Lisacek, Jean-Charles Sanchez and Markus Müller (2011).

pROC: an open-source package for R and S+ to analyze and compare ROC

curves. BMC Bioinformatics, 12, p. 77. DOI: 10.1186/1471-2105-12-77

http://www.biomedcentral.com/1471-2105/12/77/

Maja Pohar Perme and Mette Gerster (2017). pseudo: Computes

Pseudo-Observations for Modeling. R package version 1.4.3.

https://CRAN.R-project.org/package=pseudo

A. Liaw and M. Wiener (2002). Classification and Regression by

randomForest. R News 2(3), 18--22.

Aleksandra Paluszynska and Przemyslaw Biecek (2017).

randomForestExplainer: Explaining and Visualizing Random Forests in

Terms of Variable Importance. R package version 0.9.

https://CRAN.R-project.org/package=randomForestExplainer

Terry Therneau, Beth Atkinson and Brian Ripley (2017). rpart:

Recursive Partitioning and Regression Trees. R package version

4.1-11. https://CRAN.R-project.org/package=rpart

Prabhanjan Tattar (2013). RSADBE: Data related to the book "R

Statistical Application Development by Example". R package version

1.0. https://CRAN.R-project.org/package=RSADBE

Therneau T (2015). _A Package for Survival Analysis in S_. version

2.38, <URL: https://CRAN.R-project.org/package=survival>. survival

Terry M. Therneau and Patricia M. Grambsch (2000). _Modeling Survival

Data: Extending the Cox Model_. Springer, New York. ISBN

0-387-98784-3.

Tianqi Chen, Tong He, Michael Benesty, Vadim Khotilovich and Yuan

Tang (2018). xgboost: Extreme Gradient Boosting. R package version

0.6.4.1. https://CRAN.R-project.org/package=xgboost