Time Series Analysis on AWS

By : Michaël Hoarau

Time Series Analysis on AWS

By: Michaël Hoarau

Overview of this book

Being a business analyst and data scientist, you have to use many algorithms and approaches to prepare, process, and build ML-based applications by leveraging time series data, but you face common problems, such as not knowing which algorithm to choose or how to combine and interpret them. Amazon Web Services (AWS) provides numerous services to help you build applications fueled by artificial intelligence (AI) capabilities. This book helps you get to grips with three AWS AI/ML-managed services to enable you to deliver your desired business outcomes. The book begins with Amazon Forecast, where you’ll discover how to use time series forecasting, leveraging sophisticated statistical and machine learning algorithms to deliver business outcomes accurately. You’ll then learn to use Amazon Lookout for Equipment to build multivariate time series anomaly detection models geared toward industrial equipment and understand how it provides valuable insights to reinforce teams focused on predictive maintenance and predictive quality use cases. In the last chapters, you’ll explore Amazon Lookout for Metrics, and automatically detect and diagnose outliers in your business and operational data. By the end of this AWS book, you’ll have understood how to use the three AWS AI services effectively to perform time series analysis.

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Share Your Thoughts

Section 1: Analyzing Time Series and Delivering Highly Accurate Forecasts with Amazon Forecast

Free Chapter

Chapter 1: An Overview of Time Series Analysis

Technical requirements

What is a time series dataset?

Recognizing the different families of time series

Adding context to time series data

Learning about common time series challenges

Selecting an analysis approach

Typical time series use cases

Summary

Chapter 2: An Overview of Amazon Forecast

Technical requirements

What kinds of problems can we solve with forecasting?

What is Amazon Forecast?

How does Amazon Forecast work?

Choosing the right applications

Summary

Chapter 3: Creating a Project and Ingesting Your Data

Technical requirements

Understanding the components of a dataset group

Preparing a dataset for forecasting purposes

Creating an Amazon Forecast dataset group

Ingesting data in Amazon Forecast

Summary

Chapter 4: Training a Predictor with AutoML

Technical requirements

Using your datasets to train a predictor

How Amazon Forecast leverages automated machine learning

Understanding the predictor evaluation dashboard

Exporting and visualizing your predictor backtest results

Summary

Chapter 5: Customizing Your Predictor Training

Technical requirements

Choosing an algorithm and configuring the training parameters

Leveraging HPO

Reinforcing your backtesting strategy

Including holiday and weather data

Implementing featurization techniques

Customizing quantiles to suit your business needs

Summary

Chapter 6: Generating New Forecasts

Technical requirements

Generating a forecast

Using lookup to get your items forecast

Exporting and visualizing your forecasts

Generating explainability for your forecasts

Summary

Chapter 7: Improving and Scaling Your Forecast Strategy

Technical requirements

Deep diving into forecasting model metrics

Understanding your model accuracy

Model monitoring and drift detection

Serverless architecture orchestration

Summary

Section 2: Detecting Abnormal Behavior in Multivariate Time Series with Amazon Lookout for Equipment

Chapter 8: An Overview of Amazon Lookout for Equipment

Technical requirements

What is Amazon Lookout for Equipment?

What are the different approaches to tackle anomaly detection?

The challenges encountered with multivariate time series data

How does Amazon Lookout for Equipment work?

How do you choose the right applications?

Summary

Chapter 9: Creating a Dataset and Ingesting Your Data

Technical requirements

Preparing a dataset for anomaly detection purposes

Creating an Amazon Lookout for Equipment dataset

Generating a JSON schema

Creating a data ingestion job

Understanding common ingestion errors and workarounds

Summary

Chapter 10: Training and Evaluating a Model

Technical requirements

Using your dataset to train a model

Model organization best practices

Choosing a good data split between training and evaluation

Evaluating a trained model

Summary

Chapter 11: Scheduling Regular Inferences

Technical requirements

Using a trained model

Configuring a scheduler

Preparing a dataset for inference

Extracting the inference results

Summary

Chapter 12: Reducing Time to Insights for Anomaly Detections

Technical requirements

Improving your model's accuracy

Processing the model diagnostics

Monitoring your models

Orchestrating each step of the process with a serverless architecture

Summary

Section 3: Detecting Anomalies in Business Metrics with Amazon Lookout for Metrics

Chapter 13: An Overview of Amazon Lookout for Metrics

Technical requirements

Recognizing different types of anomalies

What is Amazon Lookout for Metrics?

How does Amazon Lookout for Metrics work?

Identifying suitable metrics for monitoring

Choosing between Lookout for Equipment and Lookout for Metrics

Summary

Chapter 14: Creating and Activating a Detector

Technical requirements

Preparing a dataset for anomaly detection purposes

Creating a detector

Adding a dataset and connecting a data source

Understanding the backtesting mode

Configuring alerts

Summary

Chapter 15: Viewing Anomalies and Providing Feedback

Technical requirements

Training a continuous detector

Reviewing anomalies from a trained detector

Interacting with a detector

Summary

Why subscribe?

Other Books You May Enjoy

Packt is searching for authors like you

Share Your Thoughts

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Adding context to time series data

Simply speaking, there are three main ways an ML model can learn something new, as outlined here:

Supervised learning (SL): Models are trained using input data and labels (or targets). The labels are provided as an instructor would provide directions to a student learning a new move. Training a model to approximate the relationship between input data and labels is a supervised approach.
Unsupervised learning (UL): This approach is used when using ML to uncover and extract underlying relationships that may exist in a given dataset. In this case, we only operate on the input data and do not need to provide any labels or output data. We can, however, use labels to assess how good a given unsupervised model is at capturing reality.
Reinforcement learning (RL): To train a model with RL, we build an environment that is able to send feedback to an agent. We then let the agent operate within this environment (using a set of actions) and react based on the feedback provided by the environment in response to each action. We do not have a fixed training dataset anymore, but an environment that sends an input sample (feedback) in reaction to an action from the agent.

Whether you are dealing with univariate, multiple, or multivariate time series datasets, you might need to provide extra context: location, unique identification (ID) number of a batch, components from the recipes used for a given batch, sequence of actions performed by a pilot during an aircraft flight test, and so on. The same sequence of values for univariate and multivariate time series could lead to a different interpretation in different contexts (for example, are we cruising or taking off; are we producing a batch of shampoo or shower gel?).

All this additional context can be provided in the form of labels, related time series, or metadata that will be used differently depending on the type of ML you leverage. Let's have a look at what these pieces of context can look like.

Labels

Labels can be used in SL settings where ML models are trained using input data (our time series dataset) and output data (the labels). In a supervised approach, training a model is the process of learning an approximation between the input data and the labels. Let's review a few examples of labels you can encounter along with your time series datasets, as follows:

The National Aeronautics and Space Administration (NASA) has provided the community with a very widely used benchmark dataset that contains the remaining useful lifetime of a turbofan measured in cycles: each engine (identified by unit_number in the following table) has its health measured with multiple sensors, and readings are provided after each flight (or cycle). The multivariate dataset recorded for each engine can be labeled with the remaining useful lifetime (rul) known or estimated at the end of each cycle (this is the last column in the following table). Here, each individual timestamp is characterized by a label (the remaining lifetime measured in a cycle):

Figure 1.2 – NASA turbofan remaining useful lifetime

The ECG200 dataset is another widely used time series dataset as a benchmark for time series classification. The electrical activity recorded during human heartbeats can be labeled as Normal or Ischemia (myocardial infarction), as illustrated in the following screenshot. Each time series as a whole is characterized by a label:

Figure 1.3 – Heartbeat activity for 100 patients (ECG200 dataset)

Kaggle also offers a few time series datasets of interest. One of them contains sensor data from a water pump with known time ranges where the pump is broken and when it is being repaired. In the following case, labels are available as time ranges:

Figure 1.4 – Water pump sensor data showcasing healthy and broken time ranges

As you can see, labels can be used to characterize individual timestamps of a time series, portions of a time series, or even whole time series.

Related time series

Related time series are additional variables that evolve in parallel to the time series that is the target of your analysis. Let's have a look at a few examples, as follows:

In the case of a manufacturing plant producing different batches of product, a critical signal to have is the unique batch ID that can be matched with the starting and ending timestamps of the time series data.
The electricity consumption of multiple households from London can be matched with several pieces of weather data (temperature, wind speed, rainfall), as illustrated in the following screenshot:

Figure 1.5 – London household energy consumption versus outside temperature in the same period

In the water pump dataset, the different sensors' data could be considered as related time series data for the pump health variable, which can either take a value of 0 (healthy pump) or 1 (broken pump).

Metadata

When your dataset is multivariate or includes multiple time series, each of these can be associated with parameters that do not depend on time. Let's have a look at this in more detail here:

In the example of a manufacturing plant mentioned before, each batch of products could be different, and the metadata associated with each batch ID could be the recipe used to manufacture this very batch.
For London household energy consumption, each time series is associated with a household that could be further associated with its house size, the number of people, its type (house or flat), the construction time, the address, and so on. The following screenshot lists some of the metadata associated with a few households from this dataset: we can see, for instance, that 27 households fall into the ACORN-A category that has a house with 2 beds:

Figure 1.6 – London household metadata excerpt

Now you have understood how time series can be further described with additional context such as labels, related time series, and metadata, let's now dive into common challenges you can encounter when analyzing time series data.

Time Series Analysis on AWS

By : Michaël Hoarau

Time Series Analysis on AWS

By: Michaël Hoarau

Overview of this book

Related Content you might be interested in

Current Title:

Time Series Analysis on AWS

Computer Vision on AWS

Applied Machine Learning for Healthcare and Life Sciences Using AWS

Modern Time Series Forecasting with Python

Adding context to time series data

Labels

Related time series

Metadata