Book Image

Machine Learning Solutions

Book Image

Machine Learning Solutions

Overview of this book

Machine learning (ML) helps you find hidden insights from your data without the need for explicit programming. This book is your key to solving any kind of ML problem you might come across in your job. You’ll encounter a set of simple to complex problems while building ML models, and you'll not only resolve these problems, but you’ll also learn how to build projects based on each problem, with a practical approach and easy-to-follow examples. The book includes a wide range of applications: from analytics and NLP, to computer vision domains. Some of the applications you will be working on include stock price prediction, a recommendation engine, building a chat-bot, a facial expression recognition system, and many more. The problem examples we cover include identifying the right algorithm for your dataset and use cases, creating and labeling datasets, getting enough clean data to carry out processing, identifying outliers, overftting datasets, hyperparameter tuning, and more. Here, you'll also learn to make more timely and accurate predictions. In addition, you'll deal with more advanced use cases, such as building a gaming bot, building an extractive summarization tool for medical documents, and you'll also tackle the problems faced while building an ML model. By the end of this book, you'll be able to fine-tune your models as per your needs to deliver maximum productivity.
Table of Contents (19 chapters)
Machine Learning Solutions
Foreword
Contributors
Preface
Index

Problems with the existing approach


We got the baseline score using the AdaBoost and GradientBoosting classifiers. Now, we need to increase the accuracy of these classifiers. In order to do that, we first list all the areas that can be improvised but that we haven't worked upon extensively. We also need to list possible problems with the baseline approach. Once we have the list of the problems or the areas on which we need to work, it will be easy for us to implement the revised approach.

Here, I'm listing some of the areas, or problems, that we haven't worked on in our baseline iteration:

  • Problem: We haven't used cross-validation techniques extensively in order to check the overfitting issue.

    • Solution: If we use cross-validation techniques properly, then we will know whether our trained ML model suffers from overfitting or not. This will help us because we don't want to build a model that can't even be generalized properly.

  • Problem: We also haven't focused on hyperparameter tuning. In our baseline approach, we mostly use the default parameters. We define these parameters during the declaration of the classifier. You can refer to the code snippet given in Figure 1.52, where you can see the classifier taking some parameters that are used when it trains the model. We haven't changed these parameters.

    • Solution: We need to tune these hyperparameters in such a way that we can increase the accuracy of the classifier. There are various hyperparameter-tuning techniques that we need to use.

In the next section, we will look at how these optimization techniques actually work as well as discuss the approach that we are going to take. So let's begin!