Book Image

Hands-On Machine Learning with IBM Watson

By : James D. Miller
Book Image

Hands-On Machine Learning with IBM Watson

By: James D. Miller

Overview of this book

IBM Cloud is a collection of cloud computing services for data analytics using machine learning and artificial intelligence (AI). This book is a complete guide to help you become well versed with machine learning on the IBM Cloud using Python. Hands-On Machine Learning with IBM Watson starts with supervised and unsupervised machine learning concepts, in addition to providing you with an overview of IBM Cloud and Watson Machine Learning. You'll gain insights into running various techniques, such as K-means clustering, K-nearest neighbor (KNN), and time series prediction in IBM Cloud with real-world examples. The book will then help you delve into creating a Spark pipeline in Watson Studio. You will also be guided through deep learning and neural network principles on the IBM Cloud using TensorFlow. With the help of NLP techniques, you can then brush up on building a chatbot. In later chapters, you will cover three powerful case studies, including the facial expression classification platform, the automated classification of lithofacies, and the multi-biometric identity authentication platform, helping you to become well versed with these methodologies. By the end of this book, you will be ready to build efficient machine learning solutions on the IBM Cloud and draw insights from the data at hand using real-world examples.
Table of Contents (15 chapters)
Free Chapter
1
Section 1: Introduction and Foundation
6
Section 2: Tools and Ingredients for Machine Learning in IBM Cloud
10
Section 3: Real-Life Complete Case Studies

Data preparation

In this step, the data in a DataFrame object is split (using the Spark randomSplit command) into three—a training set (to be used to train a model), a testing set (to be used for model evaluation and testing the assumptions of the model), and a prediction set (used for prediction) and then a record count is printed for each set:

splitted_data=df_data.randomSplit([0.8,0.18,0.02],24)
train_data=splitted_data[0]
test_data=splitted_data[1]
predict_data=splitted_data[2]
print("Number of training records: " + str(train_data.count())) print("Number of testing records : " + str(test_data.count())) print("Number of prediction records : " + str(predict_data.count()))

Executing the preceding commands within the notebook is shown in the following screenshot:

...