Book Image

Hands-On Machine Learning with IBM Watson

By : James D. Miller
Book Image

Hands-On Machine Learning with IBM Watson

By: James D. Miller

Overview of this book

IBM Cloud is a collection of cloud computing services for data analytics using machine learning and artificial intelligence (AI). This book is a complete guide to help you become well versed with machine learning on the IBM Cloud using Python. Hands-On Machine Learning with IBM Watson starts with supervised and unsupervised machine learning concepts, in addition to providing you with an overview of IBM Cloud and Watson Machine Learning. You'll gain insights into running various techniques, such as K-means clustering, K-nearest neighbor (KNN), and time series prediction in IBM Cloud with real-world examples. The book will then help you delve into creating a Spark pipeline in Watson Studio. You will also be guided through deep learning and neural network principles on the IBM Cloud using TensorFlow. With the help of NLP techniques, you can then brush up on building a chatbot. In later chapters, you will cover three powerful case studies, including the facial expression classification platform, the automated classification of lithofacies, and the multi-biometric identity authentication platform, helping you to become well versed with these methodologies. By the end of this book, you will be ready to build efficient machine learning solutions on the IBM Cloud and draw insights from the data at hand using real-world examples.
Table of Contents (15 chapters)
Free Chapter
1
Section 1: Introduction and Foundation
6
Section 2: Tools and Ingredients for Machine Learning in IBM Cloud
10
Section 3: Real-Life Complete Case Studies

Preprocessing

What does preprocessing mean?

Beyond selecting a specific set of data that you want to use for a particular machine learning project, you also need to preprocess that data. This typically involves tasks such as formatting, cleaning, and sampling (or profiling). We won't be delving too far into the definitions of each of these tasks, and will assume that the reader grasps their meaning and purpose. We'll say that formatting is a way of simply putting the data source into a form that can be easily understood and consumed within your project. Cleaning is mostly concerned with removing unwanted data and sampling is all about reducing the overall size of the data for performance reasons.

Although, being a developer at heart, I am anxious to take on these tasks by crafting a script or perusing and selecting a function from an open source library, instead, let...