Book Image

Machine Learning Solutions

Book Image

Machine Learning Solutions

Overview of this book

Machine learning (ML) helps you find hidden insights from your data without the need for explicit programming. This book is your key to solving any kind of ML problem you might come across in your job. You’ll encounter a set of simple to complex problems while building ML models, and you'll not only resolve these problems, but you’ll also learn how to build projects based on each problem, with a practical approach and easy-to-follow examples. The book includes a wide range of applications: from analytics and NLP, to computer vision domains. Some of the applications you will be working on include stock price prediction, a recommendation engine, building a chat-bot, a facial expression recognition system, and many more. The problem examples we cover include identifying the right algorithm for your dataset and use cases, creating and labeling datasets, getting enough clean data to carry out processing, identifying outliers, overftting datasets, hyperparameter tuning, and more. Here, you'll also learn to make more timely and accurate predictions. In addition, you'll deal with more advanced use cases, such as building a gaming bot, building an extractive summarization tool for medical documents, and you'll also tackle the problems faced while building an ML model. By the end of this book, you'll be able to fine-tune your models as per your needs to deliver maximum productivity.
Table of Contents (19 chapters)
Machine Learning Solutions
Foreword
Contributors
Preface
Index

Understanding datasets


In order to develop a chatbot, we are using two datasets. These datasets are as follows:

  • Cornell Movie-Dialogs dataset

  • bAbI dataset

Cornell Movie-Dialogs dataset

This dataset has been widely used for developing chatbots. You can download the Cornell Movie-Dialogs corpus from this link: https://www.cs.cornell.edu/~cristian/Cornell_Movie-Dialogs_Corpus.html. This corpus contains a large metadata-rich collection of fictional conversations extracted from raw movie scripts.

This corpus has 220,579 conversational exchanges between 10,292 pairs of movie characters. It involves 9,035 characters from 617 movies. In total, it has 304,713 utterances. This dataset also contains movie metadata. There are the following types of metadata:

  • Movie-related metadata includes the following details:

    • Genre of the movie

    • Release year

    • IMDb rating

  • Character-related metadata includes the following details:

    • Gender of 3,774 characters

    • Total number of characters in movies

When you download this dataset, you...