Book Image

Machine Learning Solutions

Book Image

Machine Learning Solutions

Overview of this book

Machine learning (ML) helps you find hidden insights from your data without the need for explicit programming. This book is your key to solving any kind of ML problem you might come across in your job. You’ll encounter a set of simple to complex problems while building ML models, and you'll not only resolve these problems, but you’ll also learn how to build projects based on each problem, with a practical approach and easy-to-follow examples. The book includes a wide range of applications: from analytics and NLP, to computer vision domains. Some of the applications you will be working on include stock price prediction, a recommendation engine, building a chat-bot, a facial expression recognition system, and many more. The problem examples we cover include identifying the right algorithm for your dataset and use cases, creating and labeling datasets, getting enough clean data to carry out processing, identifying outliers, overftting datasets, hyperparameter tuning, and more. Here, you'll also learn to make more timely and accurate predictions. In addition, you'll deal with more advanced use cases, such as building a gaming bot, building an extractive summarization tool for medical documents, and you'll also tackle the problems faced while building an ML model. By the end of this book, you'll be able to fine-tune your models as per your needs to deliver maximum productivity.
Table of Contents (19 chapters)
Machine Learning Solutions
Foreword
Contributors
Preface
Index

Understanding the datasets


Finding out an appropriate dataset is a challenging task in data science. Sometimes, you find a dataset but it is not in the appropriate format. Our problem statement will decide what type of dataset and data format we need. These kinds of activities are a part of data wrangling.

Note

Data wrangling is defined as the process of transforming and mapping data from one data form into another. With transformation and mapping, our intention should be to create an appropriate and valuable dataset that can be useful in order to develop analytics products. Data wrangling is also referred to as data munging and is a crucial part of any data science application.

Generally, e-commerce datasets are proprietary datasets, and it's rare that you get transactions of real users. Fortunately, The UCI Machine Learning Repository hosts a dataset named Online Retail. This dataset contains actual transactions from UK retailers.

Description of the dataset

This Online Retail dataset contains...