Book Image

Mastering Python for Data Science

By : Samir Madhavan
Book Image

Mastering Python for Data Science

By: Samir Madhavan

Overview of this book

Table of Contents (19 chapters)
Mastering Python for Data Science
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
7
Estimating the Likelihood of Events
Index

Chapter 9. Pushing Boundaries with Ensemble Models

Ensemble modeling is a process where two or more models are generated and then their results are combined. In this chapter, we'll cover a random forest, which is a nonparametric modeling technique where multiple decision trees are created during training time, and then the result of these decision trees are averaged to give the required output. It's called a random forest because many decision trees are created during training time on randomly selected features.

An analogy of this would be to try to guess the number of pebbles in a glass jar. There are groups of people who try to guess the number of pebbles in the jar. Individually, each person would be very wrong in guessing the number of pebbles in the glass jar, but when you average each of their guesses, the resulting averaged guess would be pretty close to the actual number of pebbles in the jar.

In this chapter, you'll learn how to:

  • Work with census data on US earnings and explore this...