Book Image

Spark Cookbook

By : Rishi Yadav
Book Image

Spark Cookbook

By: Rishi Yadav

Overview of this book

Table of Contents (19 chapters)
Spark Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Doing classification using decision trees


Decision trees are the most intuitive among machine learning algorithms. We use decision trees in daily life all the time.

Decision tree algorithms have a lot of useful features:

  • Easy to understand and interpret

  • Work with both categorical and continuous features

  • Work with missing features

  • Do not require feature scaling

Decision tree algorithms work in an upside-down order in which an expression containing a feature is evaluated at every level and that splits the dataset into two categories. We'll help you understand this with the simple example of a dumb charade, which most of us played in college. I guessed an animal and asked my coworker ask me questions to work out my choice. Here's how her questioning went:

Q1: Is it a big animal?

A: Yes

Q2: Does this animal live more than 40 years?

A: Yes

Q3: Is this animal an elephant?

A: Yes

This is an obviously oversimplified case in which she knew I had postulated an elephant (what else would you guess in a Big Data...