Book Image

Advanced Machine Learning with R

By : Cory Lesmeister, Dr. Sunil Kumar Chinnamgari
Book Image

Advanced Machine Learning with R

By: Cory Lesmeister, Dr. Sunil Kumar Chinnamgari

Overview of this book

R is one of the most popular languages when it comes to exploring the mathematical side of machine learning and easily performing computational statistics. This Learning Path shows you how to leverage the R ecosystem to build efficient machine learning applications that carry out intelligent tasks within your organization. You’ll work through realistic projects such as building powerful machine learning models with ensembles to predict employee attrition. Next, you’ll explore different clustering techniques to segment customers using wholesale data and even apply TensorFlow and Keras-R for performing advanced computations. Each chapter will help you implement advanced machine learning algorithms using real-world examples. You’ll also be introduced to reinforcement learning along with its use cases and models. Finally, this Learning Path will provide you with a glimpse into how some of these black box models can be diagnosed and understood. By the end of this Learning Path, you’ll be equipped with the skills you need to deploy machine learning techniques in your own projects.
Table of Contents (30 chapters)
Title Page
Copyright and Credits
About Packt
Contributors
Preface
Index

Preface

R is one of the most popular languages when it comes to exploring the mathematical side of machine learning and easily performing computational statistics.

This Learning Path shows you how to leverage the R ecosystem to build efficient machine learning applications that carry out intelligent tasks within your organization. You'll tackle realistic projects such as building powerful machine learning models with ensembles to predict employee attrition. You'll explore different clustering techniques to segment customers using wholesale data and use TensorFlow and Keras-R for performing advanced computations. Each chapter will help you implement advanced machine learning algorithms using real-world examples. You’ll also be introduced to reinforcement learning along with its various use cases and models. Additionally, this book provides you with a glimpse into how some of these black-box models can be diagnosed and understood.

By the end of this Learning Path, you’ll be equipped with the skills you need to deploy machine learning techniques in your own projects.

Who this book is for

If you’re a data analyst, data scientist, or machine learning developer who wants to master machine learning techniques using R, this is an ideal Learning Path for you. Each project will help you test your skills in implementing machine learning algorithms and techniques. A basic understanding of machine learning and working knowledge of R programming is necessary to get the most out of this Learning Path.

What this book covers

Chapter 1, Preparing and Understanding Data, covers the loading of data and demonstrates how to obtain an understanding of its structure and dimensions, as well as how to install the necessary packages.

Chapter 2, Linear Regression,  provides you with a solid foundation before learning advanced methods such as Support Vector Machines and Gradient Boosting. No more solid foundation exists than the least squares linear regression.

 

Chapter 3, Logistic Regression, presents a discussion on how logistic regression and discriminant analysis is used in order to predict a categorical outcome. Multivariate adaptive regression splines have been added. This technique performs well, handles non-linearity, and is easy to explain.

Chapter 4, Advanced Feature Selection in Linear Models,  shows regularization techniques to help improve the predictive ability and interpretability as feature selection is a critical and often extremely challenging component of machine learning. It also includes techniques not only for regression but also for a classification problem.

Chapter 5, K-Nearest Neighbors and Support Vector Machines, begins the exploration of the more advanced and nonlinear techniques. The real power of machine learning will be unveiled.

Chapter 6, Tree-Based Classification, offers some of the most powerful predictive abilities of all the machine learning techniques, especially for classification problems. Single decision trees will be discussed along with the more advanced random forests and boosted trees. It also contains very popular techniques provided by the XGBOOST package.

Chapter 7, Neural Networks and Deep Learning, shows some of the most exciting machine learning methods currently used. Inspired by how the brain works, neural networks and their more recent and advanced offshoot, Deep Learning, will be put to the test. It also includes code for the H2O package, including hyperparameter search.

Chapter 8, Creating Ensembles and Multiclass Methods, has completely new content, involving the utilization of several great packages. 

Chapter 9, Cluster Analysis,  covers unsupervised learning. Instead of trying to make a prediction, the goal will focus on uncovering the latent structure of observations. Three clustering methods will be discussed: hierarchical, k-means, and partitioning around medoids. It also includes the methodology for executing unsupervised learning with random forests.

Chapter 10Principal Component Analysis, continues the examination of unsupervised learning with principal components analysis, which is used to uncover the latent structure of the features. Once this is done, the new features will be used in a supervised learning exercise.

Chapter 11, Association Analysis, explains association analysis and applies not only to making recommendations, product placement, and promotional pricing, but can also be used in manufacturing, web usage, and healthcare.

 

Chapter 12, Time Series and Causality,  discusses univariate forecast models, bivariate regression, and Granger causality models, including an analysis of carbon emissions and climate change, along with a demonstration of different causality test methods.

Chapter 13, Text Mining, demonstrates a framework for quantitative text mining and the building of topic models. Along with time series, the world of data contains vast volumes of data in a textual format. With so much data as text, it is critically important to understand how to manipulate, code, and analyze the data in order to provide meaningful insights.

Chapter 14Exploring the Machine Learning Landscape, will briefly review the various ML concepts that a practitioner must know. In this chapter, we will cover topics such as supervised learning, reinforcement learning, unsupervised learning, and real-world ML uses cases.

Chapter 15Predicting Employee Attrition Using Ensemble Models, covers the creation of powerful ML models through ensemble learning.  We will introduce the problem at hand and then attempt to explore the dataset with exploratory data analysis (EDA). Then in the preprocessing phase, we will create new features using prior domain experience. Once the dataset is fully prepared, models will be created using multiple ensemble techniques, such as bagging, boosting, stacking, and randomization. Lastly, we will deploy the finally selected model for production. 

Chapter 16Implementing a Joke Recommendation Engine, introduces recommendation engines. We start by understanding the concepts and types of collaborative filtering algorithms. We will then build a recommendation engine to provide personalized joke recommendations using collaborative filtering approaches such as user-based collaborative filters and item-based collaborative filters.  Apart from this, we will be exploring various libraries available in R that can be used to build recommendation systems.

Chapter 17Sentiment Analysis of Amazon Reviews with NLP, covers sentiment analysis, which entails finding the sentiment of a sentence and labeling it as positive, negative, or neutral and covers the various techniques that can be used to analyze text. We will understand text-mining concepts and the various ways that text is labeled based on the tone. Apart from using various popular R text-mining libraries to preprocess the reviews to be classified, we will also be leveraging a wide range of text representations, such as a bag of words, word2vec, fastText, and Glove.

Chapter 18Customer Segmentation Using Wholesale Data, covers the segmentation, grouping, or clustering of customers, which can be achieved through unsupervised learning. In this chapter, we learn the various techniques of customer segmentation.We will be applying advanced clustering techniques, such as k-means, DIANA, and AGNES. We will explore the ML techniques for dealing with such ambiguity and have ML find out the number of groups possible based on the underlying characteristics of the input data. Evaluating the output of the clustering algorithms is an area that is often challenging to practitioners.

Chapter 19Image Recognition Using Deep Neural Networks, covers convolutional neural networks (CNNs). We explore why CNNs work so well with computer vision problems such as object detection. We will learn about all of these concepts by applying a CNN in the building of a multi-class classification model on a popular open dataset called MNIST. We will learn about the various preprocessing techniques that can be applied to the image data in order to use the data with deep learning models.  

Chapter 20Credit Card Fraud Detection Using Autoencoders, covers autoencoders and how they are different from the other deep learning networks, such as recurrent neural networks (RNNs)and CNNs. We will learn about autoencoders by implementing a project that identifies credit card fraud. We will become familiar with dimensionality reduction and how it can be used to identify credit card fraud detection. 

Chapter 21Automatic Prose Generation with Recurrent Neural Networks, introduces some deep neural networks (DNNs). We will implement a neural network from scratch and will learn how to apply an RNN by doing a project. We will create an application based on long short-term memory (LSTM) network, a variant of RNNs that generates text automatically. To accomplish this task, we make use of the MXNet framework, which extends its support for the R language to perform deep learning.

Chapter 22Winning the Casino Slot Machines with Reinforcement Learning, begins with an explanation of RL. We discuss the various concepts of RL, including strategies for solving what is called as the multi-arm bandit problem. We implement a project that uses UCB and Thompson sampling techniques in order to solve the multi-arm bandit problem.

Appendix, Creating a Package, includes additional data packages.

 

 

 

 

 

 

To get the most out of this book

Assuming the reader has a working knowledge of R and of basic statistics, this book will provide the skills and tools required to get the reader up and running with R and ML as quickly and painlessly as possible. There will probably always be detractors who complain that it does not offer enough math or does not do this, or that, or the other thing, but my answer to that is that these books already exist! Why duplicate what has already been done, and very well, for that matter? Again, I have sought to provide something different, something to hold the reader's attention and allow them to succeed in this competitive and rapidly changing field.

The projects covered in this book are intended to expose you to practical knowledge on the implementation of various ML techniques to real-world problems. It is expected that you have a good working knowledge of R and some basic understanding of ML. Basic knowledge of ML and R is a must prior to starting this project.

It should also be noted that the code for the projects is implemented using R version 3.5.2 (2018-12-20), nicknamed Eggshell Igloo. The project code has been successfully tested on Linux Mint 18.3 Sylvia. There is no reason to believe that the code does not work on other platforms, such as Windows; however, this is not something that has been tested by the author.

Download the example code files

You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packt.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

  1. Log in or register at www.packt.com.
  2. Select the SUPPORT tab.
  3. Click on Code Downloads & Errata.
  4. Enter the name of the book in the Search box and follow the onscreen instructions.

 

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

  • WinRAR/7-Zip for Windows
  • Zipeg/iZip/UnRarX for Mac
  • 7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Advanced-Machine-Learning-with-R. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Conventions used

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in the text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "Mount the downloaded WebStorm-10*.dmg disk image file as another disk in your system."

A block of code is set as follows:

html, body, #map {
 height: 100%; 
 margin: 0;
 padding: 0
}

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

[default]
exten => s,1,Dial(Zap/1|30)
exten => s,2,Voicemail(u100)
exten => s,102,Voicemail(b100)
exten => i,1,Voicemail(s0)

Any command-line input or output is written as follows:

$ mkdir css
$ cd css

Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "Select System info from the Administration panel."

Note

Warnings or important notes appear like this.

Note

Tips and tricks appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packt.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

 

 

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packt.com.