Machine Learning Solutions

Machine Learning Solutions

Overview of this book

Machine learning (ML) helps you find hidden insights from your data without the need for explicit programming. This book is your key to solving any kind of ML problem you might come across in your job. You’ll encounter a set of simple to complex problems while building ML models, and you'll not only resolve these problems, but you’ll also learn how to build projects based on each problem, with a practical approach and easy-to-follow examples. The book includes a wide range of applications: from analytics and NLP, to computer vision domains. Some of the applications you will be working on include stock price prediction, a recommendation engine, building a chat-bot, a facial expression recognition system, and many more. The problem examples we cover include identifying the right algorithm for your dataset and use cases, creating and labeling datasets, getting enough clean data to carry out processing, identifying outliers, overftting datasets, hyperparameter tuning, and more. Here, you'll also learn to make more timely and accurate predictions. In addition, you'll deal with more advanced use cases, such as building a gaming bot, building an extractive summarization tool for medical documents, and you'll also tackle the problems faced while building an ML model. By the end of this book, you'll be able to fine-tune your models as per your needs to deliver maximum productivity.

Machine Learning Solutions

Foreword

Contributors

Preface

Free Chapter

Credit Risk Modeling

Introducing the problem statement

Understanding the dataset

Feature engineering for the baseline model

Selecting machine learning algorithms

Training the baseline model

Understanding the testing matrix

Testing the baseline model

Problems with the existing approach

Optimizing the existing approach

Implementing the revised approach

Best approach

Summary

Stock Market Price Prediction

Introducing the problem statement

Collecting the dataset

Understanding the dataset

Data preprocessing and data analysis

Feature engineering

Selecting the Machine Learning algorithm

Training the baseline model

Understanding the testing matrix

Testing the baseline model

Exploring problems with the existing approach

Understanding the revised approach

Implementing the revised approach

The best approach

Summary

Customer Analytics

Introducing customer segmentation

Understanding the datasets

Building the baseline approach

Building the revised approach

The best approach

Customer segmentation for various domains

Summary

Recommendation Systems for E-Commerce

Introducing the problem statement

Understanding the datasets

Building the baseline approach

Building the revised approach

The best approach

Summary

Sentiment Analysis

Introducing problem statements

Understanding the dataset

Building the training and testing datasets for the baseline model

Feature engineering for the baseline model

Selecting the machine learning algorithm

Training the baseline model

Understanding the testing matrix

Testing the baseline model

Problem with the existing approach

How to optimize the existing approach

Implementing the revised approach

The best approach

Summary

Job Recommendation Engine

Introducing the problem statement

Understanding the datasets

Building the baseline approach

Building the revised approach

The best approach

Summary

Text Summarization

Understanding the basics of summarization

Introducing the problem statement

Understanding datasets

Building the baseline approach

Building the revised approach

The best approach

Summary

Developing Chatbots

Introducing the problem statement

Understanding datasets

Building the basic version of a chatbot

Implementing the rule-based chatbot

Testing the rule-based chatbot

Problems with the existing approach

Implementing the revised approach

Testing the revised approach

Problems with the revised approach

The best approach

Discussing the hybrid approach

Summary

Building a Real-Time Object Recognition App

Introducing the problem statement

Understanding the dataset

Transfer Learning

Setting up the coding environment

Features engineering for the baseline model

Selecting the machine learning algorithm

Building the baseline model

Understanding the testing metrics

Testing the baseline model

Problem with existing approach

How to optimize the existing approach

Implementing the revised approach

The best approach

Summary

Face Recognition and Face Emotion Recognition

Introducing the problem statement

Setting up the coding environment

Understanding the concepts of face recognition

Approaches for implementing face recognition

Understanding the dataset for face emotion recognition

Understanding the concepts of face emotion recognition

Building the face emotion recognition model

Understanding the testing matrix

Testing the model

Problems with the existing approach

How to optimize the existing approach

The best approach

Summary

Building Gaming Bot

Introducing the problem statement

Setting up the coding environment

Understanding Reinforcement Learning (RL)

Basic Atari gaming bot

Implementing the basic version of the gaming bot

Building the Space Invaders gaming bot

Implementing the Space Invaders gaming bot

Building the Pong gaming bot

Implementing the Pong gaming bot

Just for fun - implementing the Flappy Bird gaming bot

Summary

List of Cheat Sheets

Cheat sheets

Summary

Strategy for Wining Hackathons

Strategy for winning hackathons

Keeping up to date

Summary

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Best approach

As mentioned in the previous section, in this iteration, we will focus on feature transformation as well as implementing a voting classifier that will use the AdaBoost and GradientBoosting classifiers. Hopefully, by using this approach, we will get the best ROC-AUC score on the validation dataset as well as the real testing dataset. This is the best possible approach in order to generate the best result. If you have any creative solutions, you can also try them as well. Now we will jump to the implementation part.

Implementing the best approach

Here, we will implement the following techniques:

Log transformation of features
Voting-based ensemble model

Let's implement feature transformation first.

Log transformation of features

We will apply log transformation to our training dataset. The reason behind this is that we have some attributes that are very skewed and some data attributes that have values that are more spread out in nature. So, we will be taking the natural log of one plus the input feature array. You can refer to the code snippet shown in the following figure:

Figure 1.63: Code snippet for log(p+1) transformation of features.

I have also tested the ROC-AUC accuracy on the validation dataset, which gives us a minor change in accuracy.

Voting-based ensemble ML model

In this section, we will use a voting-based ensemble classifier. The scikit-learn library already has a module available for this. So, we implement a voting-based ML model for both untransformed features as well as transformed features. Let's see which version scores better on the validation dataset. You can refer to the code snippet given in the following figure:

Figure 1.64: Code snippet for a voting based ensemble classifier

Here, we are using two parameters: weight 2 for GradientBoosting and 1 for the AdaBoost algorithm. I have also set the voting parameter as soft so classifiers can be more collaborative.

We are almost done with trying out our best approach using a voting mechanism. In the next section, we will run our ML model on a real testing dataset. So let's do some real testing!

Running ML models on real test data

Here, we will be testing the accuracy of a voting-based ML model on our testing dataset. In the first iteration, we are not going to take log transformation for the test dataset, and in the second iteration, we are going to take log transformation for the test dataset. In both cases, we will generate the probability for the target class. Here, we are generating probability because we want to know how much of a chance there is of a particular person defaulting on their loan in the next 2 years. We will save the predicted probability in a csv file.

You can see the code for performing testing in the following figure:

Figure 1.65: Code snippet for testing

If you can see Figure 1.64 then you come to know that here, we have achieved 86% accuracy. This score is by far the most efficient accuracy as per industry standards.

Machine Learning Solutions

Machine Learning Solutions

Overview of this book

Related Content you might be interested in

Current Title:

Machine Learning Solutions

Python Natural Language Processing

Reinforcement Learning with TensorFlow

Hands-On Recommendation Systems with Python

Best approach

Implementing the best approach

Log transformation of features

Voting-based ensemble ML model

Running ML models on real test data