Book Image

Machine Learning with R Quick Start Guide

By : Iván Pastor Sanz
Book Image

Machine Learning with R Quick Start Guide

By: Iván Pastor Sanz

Overview of this book

Machine Learning with R Quick Start Guide takes you on a data-driven journey that starts with the very basics of R and machine learning. It gradually builds upon core concepts so you can handle the varied complexities of data and understand each stage of the machine learning pipeline. From data collection to implementing Natural Language Processing (NLP), this book covers it all. You will implement key machine learning algorithms to understand how they are used to build smart models. You will cover tasks such as clustering, logistic regressions, random forests, support vector machines, and more. Furthermore, you will also look at more advanced aspects such as training neural networks and topic modeling. By the end of the book, you will be able to apply the concepts of machine learning, deal with data-related problems, and solve them using the powerful yet simple language that is R.
Table of Contents (9 chapters)

Taking further steps

We will be using the US bankruptcy problem statement to help you understand machine learning processes in depth and also to give you hands-on experience in dealing with and solving real-world problems. All the following chapters will describe each step in detail.

The objective of the following chapters is to describe all the steps and alternatives to develop a model based on machine learning techniques.

We will see several steps, starting from the extraction of the information and the generation of new variables up to the validation of the model. As we will see, in each step of the development, some alternatives or multiple steps are possible. In most of the cases, the best alternative will be the one that gives a better predictive model, but sometimes other alternatives will be chosen owing to some restrictions that are imposed by the future use of the model or the kind of problem we want to solve.

Background on the financial crisis

In this book, we will solve two different problems related to the financial crisis: the bankruptcy of the US banks and the assessment of the solvency of the European countries. Why have I chosen such a specific problem for this book? Well, the first reason is my concern about the financial crisis and my aim to try to avoid future crises. On the other hand, it is an interesting problem because a high amount of data is available, making it a very appropriate problem to understand machine learning techniques.

Most of the chapters in this book will cover the development of a predictive model to detect the failures of banks. To solve this problem, we will use a large dataset that collects some of the more typical problems you can find when dealing with different algorithms. For example, a high amount of observations and variables and an unbalanced sample means one of the categories in the classification model is much larger than the other.

Some of the steps we will see during the following chapters are as follows:

  • Data collection
  • Features generation
  • Descriptive analysis
  • Treatment of missing information
  • Univariate analysis
  • Multivariate analysis
  • Model selection

The last chapter will focus on the development of models to detect economic imbalances in the European countries, while covering some basic text mining and clustering techniques.

Although this book is technical, one of the most important aspects of each big data and machine learning solution is understanding the problem that we need to solve.

By the end of this book, you will see that just knowing algorithms is not enough to develop models. There are many important steps that you will need to follow before jumping into running algorithms. If you pay attention to these preliminary steps, you are more likely to get good results.

In this sense, and because I'm passionate about economic theory, you can find a summary about the causes of the problems that we will solve in this book, from an economic point of view, in the repository where the code for this book is located. Specifically, the causes of the financial crisis and the contagion and transformation to a sovereign crisis are described.