Book Image

Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits

By : Tarek Amr
Book Image

Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits

By: Tarek Amr

Overview of this book

Machine learning is applied everywhere, from business to research and academia, while scikit-learn is a versatile library that is popular among machine learning practitioners. This book serves as a practical guide for anyone looking to provide hands-on machine learning solutions with scikit-learn and Python toolkits. The book begins with an explanation of machine learning concepts and fundamentals, and strikes a balance between theoretical concepts and their applications. Each chapter covers a different set of algorithms, and shows you how to use them to solve real-life problems. You’ll also learn about various key supervised and unsupervised machine learning algorithms using practical examples. Whether it is an instance-based learning algorithm, Bayesian estimation, a deep neural network, a tree-based ensemble, or a recommendation system, you’ll gain a thorough understanding of its theory and learn when to apply it. As you advance, you’ll learn how to deal with unlabeled data and when to use different clustering and anomaly detection algorithms. By the end of this machine learning book, you’ll have learned how to take a data-driven approach to provide end-to-end machine learning solutions. You’ll also have discovered how to formulate the problem at hand, prepare required data, and evaluate and deploy models in production.
Table of Contents (18 chapters)
1
Section 1: Supervised Learning
8
Section 2: Advanced Supervised Learning
13
Section 3: Unsupervised Learning and More

Getting a more reliable score

The Iris dataset is a small set of just 150 samples. When we randomly split it into training and test sets, we ended up with 45 instances in the test set. With such a small number, we may have variations in the distribution of our targets. For example, when I randomly split the data, I got 13 samples from class 0 and 16 samples from each one of the two other classesin my test set. Knowing that predicting class 0 is easier than the other two classes in this particular dataset, we can tell that if I was luckier and had more samples of class 0 in the test set, I'd have had a higher score. Furthermore, decision trees are very sensitive to data changes, and you may get a very different tree with every slight change in your training data.

What to do now to get a more reliable score

A statistician would say let's run the whole process of data splitting, training, and predicting, more than once, and get the distribution of the different...