Book Image

Predictive Analytics Using Rattle and Qlik Sense

By : Ferran Garcia Pagans, Fernando G Pagans
Book Image

Predictive Analytics Using Rattle and Qlik Sense

By: Ferran Garcia Pagans, Fernando G Pagans

Overview of this book

Table of Contents (16 chapters)
Predictive Analytics Using Rattle and Qlik Sense
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Entropy and information gain


Before we explain how to create a Decision Tree, we need to introduce two important concepts—entropy and information gain.

Entropy measures the homogeneity of a dataset. Imagine a dataset with 10 observations with one attribute, as shown in the following diagram, the value of this attribute is A for the 10 observations. This dataset is completely homogenous and is easy to predict the value of the next observation, it'll probably be A:

The entropy in a dataset that is completely homogenous is zero. Now, imagine a similar dataset, but in this dataset each observation has a different value, as shown in the following diagram:

Now, the dataset is very heterogeneous and it's hard to predict the following observation. In this dataset, the entropy is higher. The formula to calculate the entropy is , where is the probability of x.

Try to calculate the entropy for the following datasets:

Now, we understand how entropy helps us to know the level of predictability of a dataset...