Book Image

R Deep Learning Essentials

By : Joshua F. Wiley
Book Image

R Deep Learning Essentials

By: Joshua F. Wiley

Overview of this book

<p>Deep learning is a branch of machine learning based on a set of algorithms that attempt to model high-level abstractions in data by using model architectures. With the superb memory management and the full integration with multi-node big data platforms, the H2O engine has become more and more popular among data scientists in the field of deep learning.</p> <p>This book will introduce you to the deep learning package H2O with R and help you understand the concepts of deep learning. We will start by setting up important deep learning packages available in R and then move towards building models related to neural networks, prediction, and deep prediction, all of this with the help of real-life examples.</p> <p>After installing the H2O package, you will learn about prediction algorithms. Moving ahead, concepts such as overfitting data, anomalous data, and deep prediction models are explained. Finally, the book will cover concepts relating to tuning and optimizing models.</p>
Table of Contents (14 chapters)
R Deep Learning Essentials
Credits
About the Author
About the Reviewer
www.PacktPub.com
Preface
Bibliography
Index

The problem of overfitting data – the consequences explained


A common issue in machine learning is the problem of overfitting data. Generally, overfitting is used to refer to the phenomenon where, in the data used to train the model, the model performs better than it does on data not used to train the model (holdout data, future real use, and so on). Overfitting occurs when a model fits what is essentially noise in the training data. It appears to become more accurate as it accounts for the noise, but because the noise changes from one dataset to the next, this accuracy does not apply to any data but the training data—it does not generalize.

Overfitting can occur at any time but tends to become more severe as the ratio of parameters to information increases. Usually, this is can be thought of as the ratio of parameters to observations, but not always (for example, suppose the outcome is a rare event that occurs in 1 in 5 million people, a sample size of 15 million may still only have 3 people...