Book Image

Mastering Machine Learning with scikit-learn - Second Edition

By : Gavin Hackeling
Book Image

Mastering Machine Learning with scikit-learn - Second Edition

By: Gavin Hackeling

Overview of this book

Machine learning is the buzzword bringing computer science and statistics together to build smart and efficient models. Using powerful algorithms and techniques offered by machine learning you can automate any analytical model. This book examines a variety of machine learning models including popular machine learning algorithms such as k-nearest neighbors, logistic regression, naive Bayes, k-means, decision trees, and artificial neural networks. It discusses data preprocessing, hyperparameter optimization, and ensemble methods. You will build systems that classify documents, recognize images, detect ads, and more. You will learn to use scikit-learn’s API to extract features from categorical variables, text and images; evaluate model performance, and develop an intuition for how to improve your model’s performance. By the end of this book, you will master all required concepts of scikit-learn to build efficient models at work to carry out advanced tasks with the practical approach.
Table of Contents (22 chapters)
Title Page
Credits
About the Author
About the Reviewer
www.PacktPub.com
Customer Feedback
Preface
9
From Decision Trees to Random Forests and Other Ensemble Methods
Index

Decision trees


Decision trees are tree-like graphs that model a decision. They are analogous to the parlor game Twenty Questions. In Twenty Questions, one player, called the answerer, chooses an object but does not reveal the object to the other players, who are called questioners. The object should be a common noun, such as "guitar"or "sandwich", but not "1969 Gibson Les Paul Custom" or "North Carolina". The questioners must guess the object by asking as many as twenty questions that can be answered with "yes", "no", or "maybe". An intuitive strategy for questioners is to ask questions of increasing specificity; asking"is it a musical instrument?" as the first question will not efficiently reduce the number of possibilities, on average. The branches of a decision tree specify the shortest sequences of featuresthat can be examined in order to estimate the value of a response variable. To continue the analogy, in Twenty Questions, the questioner and the answerer all have knowledge of the...