AWS Certified Machine Learning Specialty: MLS-C01 Certification Guide

By : Somanath Nanda, Weslley Moura

AWS Certified Machine Learning Specialty: MLS-C01 Certification Guide

By: Somanath Nanda, Weslley Moura

Overview of this book

The AWS Certified Machine Learning Specialty exam tests your competency to perform machine learning (ML) on AWS infrastructure. This book covers the entire exam syllabus using practical examples to help you with your real-world machine learning projects on AWS. Starting with an introduction to machine learning on AWS, you'll learn the fundamentals of machine learning and explore important AWS services for artificial intelligence (AI). You'll then see how to prepare data for machine learning and discover a wide variety of techniques for data manipulation and transformation for different types of variables. The book also shows you how to handle missing data and outliers and takes you through various machine learning tasks such as classification, regression, clustering, forecasting, anomaly detection, text mining, and image processing, along with the specific ML algorithms you need to know to pass the exam. Finally, you'll explore model evaluation, optimization, and deployment and get to grips with deploying models in a production environment and monitoring them. By the end of this book, you'll have gained knowledge of the key challenges in machine learning and the solutions that AWS has released for each of them, along with the tools, methods, and techniques commonly used in each domain of AWS ML.

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Reviews

Section 1: Introduction to Machine Learning

Free Chapter

Chapter 1: Machine Learning Fundamentals

Comparing AI, ML, and DL

Classifying supervised, unsupervised, and reinforcement learning

The CRISP-DM modeling life cycle

Data splitting

Modeling expectations

Introducing ML frameworks

ML in the cloud

Summary

Questions

Chapter 2: AWS Application Services for AI/ML

Technical requirements

Analyzing images and videos with Amazon Rekognition

Text to speech with Amazon Polly

Speech to text with Amazon Transcribe

Implementing natural language processing with Amazon Comprehend

Translating documents with Amazon Translate

Extracting text from documents with Amazon Textract

Creating chatbots on Amazon Lex

Summary

Section 2: Data Engineering and Exploratory Data Analysis

Chapter 3: Data Preparation and Transformation

Identifying types of features

Dealing with categorical features

Dealing with numerical features

Understanding data distributions

Handling missing values

Dealing with outliers

Dealing with unbalanced datasets

Dealing with text data

Summary

Questions

Chapter 4: Understanding and Visualizing Data

Visualizing relationships in your data

Visualizing comparisons in your data

Visualizing distributions in your data

Visualizing compositions in your data

Building key performance indicators

Introducing Quick Sight

Summary

Questions

Chapter 5: AWS Services for Data Storing

Technical requirements

Storing data on Amazon S3

Controlling access to buckets and objects on Amazon S3

Protecting data on Amazon S3

Securing S3 objects at rest and in transit

Using other types of data stores

Relational Database Services (RDSes)

Managing failover in Amazon RDS

Taking automatic backup, RDS snapshots, and restore and read replicas

Writing to Amazon Aurora with multi-master capabilities

Storing columnar data on Amazon Redshift

Amazon DynamoDB for NoSQL database as a service

Summary

Chapter 6: AWS Services for Data Processing

Technical requirements

Creating ETL jobs on AWS Glue

Querying S3 data using Athena

Processing real-time data using Kinesis data streams

Storing and transforming real-time data using Kinesis Data Firehose

Different ways of ingesting data from on-premises into AWS

Processing stored data on AWS

Summary

Section 3: Data Modeling

Chapter 7: Applying Machine Learning Algorithms

Introducing this chapter

Storing the training data

A word about ensemble models

Supervised learning

Unsupervised learning

Textual analysis

Image processing

Summary

Questions

Chapter 8: Evaluating and Optimizing Models

Introducing model evaluation

Evaluating classification models

Evaluating regression models

Model optimization

Summary

Questions

Chapter 9: Amazon SageMaker Modeling

Technical requirements

Creating notebooks in Amazon SageMaker

Model tuning

Choosing instance types in Amazon SageMaker

Securing SageMaker notebooks

Creating alternative pipelines with Lambda Functions

Working with Step Functions

Summary

Why subscribe?

Other Books You May Enjoy

Packt is searching for authors like you

Leave a review - let other readers know what you think

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Classifying supervised, unsupervised, and reinforcement learning

ML is a very extensive field of study; that's why it is very important to have a clear definition of its sub-divisions. From a very broad perspective, we can split ML algorithms into two main classes: supervised learning and unsupervised learning.

Introducing supervised learning

Supervised algorithms use a class or label (from the input data) as support to find and validate the optimal solution. In Figure 1.2, there is a dataset that aims to classify fraudulent transactions from a bank:

Figure 1.2 – Sample dataset for supervised learning

The first four columns are known as features, or independent variables, and they can be used by a supervised algorithm to find fraudulent patterns. For example, by combining those four features (day of the week, EST hour, transaction amount, and merchant type) and six observations (each row is technically one observation), you can infer that e-commerce transactions with a value greater than $5,000 and processed at night are potentially fraudulent cases.

Important note

In a real scenario, we should have more observations in order to have statistical support to make this type of inference.

The key point is that we were able to infer a potential fraudulent pattern just because we knew, a priori, what is fraud and what is not fraud. This information is present in the last column of Figure 1.2 and is commonly referred to as a target variable, label, response variable, or dependent variable. If the input dataset has a target variable, you should be able to apply supervised learning.

In supervised learning, the target variable might store different types of data. For instance, it could be a binary column (yes or no), a multi-class column (class A, B, or C), or even a numerical column (any real number, such as a transaction amount). According to the data type of the target variable, you will find which type of supervised learning your problem refers to. Figure 1.3 shows how to classify supervised learning into two main groups: classification and regression algorithms:

Figure 1.3 – Choosing the right type of supervised learning given the target variable

While classification algorithms predict a class (either binary or multiple classes), regression algorithms predict a real number (either continuous or discrete).

Understanding data types is important to make the right decisions on ML projects. We can split data types into two main categories: numerical and categorical data. Numerical data can then be split into continuous or discrete subclasses, while categorical data might refer to ordinal or nominal data:

Numerical/discrete data refers to individual and countable items (for example, the number of students in a classroom or the number of items in an online shopping cart).
Numerical/continuous data refers to an infinite number of possible measurements and they often carry decimal points (for example, temperature).
Categorical/nominal data refers to labeled variables with no quantitative value (for example, name or gender).
Categorical/ordinal data adds the sense of order to a labeled variable (for example, education level or employee title level).

In other words, when choosing an algorithm for your project, you should ask yourself: do I have a target variable? Does it store categorical or numerical data? Answering these questions will put you in a better position to choose a potential algorithm that will solve your problem.

However, what if you don't have a target variable? In that case, we are facing unsupervised learning. Unsupervised problems do not provide labeled data; instead, they provide all the independent variables (or features) that will allow unsupervised algorithms to find patterns in the data. The most common type of unsupervised learning is clustering, which aims to group the observations of the dataset into different clusters, purely based on their features. Observations from the same cluster are expected to be similar to each other, but very different from observations from other clusters. Clustering will be covered in more detail in future chapters of this book.

Semi-supervised learning is also present in the ML literature. This type of algorithm is able to learn from partially labeled data (some observations contain a label and others do not).

Finally, another learning approach that has been taken by another class of ML algorithms is reinforcement learning. This approach rewards the system based on the good decisions that it has made autonomously; in other words, the system learns by experience.

We have been discussing learning approaches and classes of algorithms at a very broad level. However, it is time to get specific and introduce the term model.

AWS Certified Machine Learning Specialty: MLS-C01 Certification Guide

By : Somanath Nanda, Weslley Moura

AWS Certified Machine Learning Specialty: MLS-C01 Certification Guide

By: Somanath Nanda, Weslley Moura

Overview of this book

Related Content you might be interested in

Current Title:

AWS Certified Machine Learning Specialty: MLS-C01 Certification Guide

AWS Certified Cloud Practitioner Exam Guide

Hands-On Artificial Intelligence on Amazon Web Services

Data Wrangling on AWS

Classifying supervised, unsupervised, and reinforcement learning

Introducing supervised learning