Time Series Analysis on AWS

By : Michaël Hoarau

Time Series Analysis on AWS

By: Michaël Hoarau

Overview of this book

Being a business analyst and data scientist, you have to use many algorithms and approaches to prepare, process, and build ML-based applications by leveraging time series data, but you face common problems, such as not knowing which algorithm to choose or how to combine and interpret them. Amazon Web Services (AWS) provides numerous services to help you build applications fueled by artificial intelligence (AI) capabilities. This book helps you get to grips with three AWS AI/ML-managed services to enable you to deliver your desired business outcomes. The book begins with Amazon Forecast, where you’ll discover how to use time series forecasting, leveraging sophisticated statistical and machine learning algorithms to deliver business outcomes accurately. You’ll then learn to use Amazon Lookout for Equipment to build multivariate time series anomaly detection models geared toward industrial equipment and understand how it provides valuable insights to reinforce teams focused on predictive maintenance and predictive quality use cases. In the last chapters, you’ll explore Amazon Lookout for Metrics, and automatically detect and diagnose outliers in your business and operational data. By the end of this AWS book, you’ll have understood how to use the three AWS AI services effectively to perform time series analysis.

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Share Your Thoughts

Section 1: Analyzing Time Series and Delivering Highly Accurate Forecasts with Amazon Forecast

Free Chapter

Chapter 1: An Overview of Time Series Analysis

Technical requirements

What is a time series dataset?

Recognizing the different families of time series

Adding context to time series data

Learning about common time series challenges

Selecting an analysis approach

Typical time series use cases

Summary

Chapter 2: An Overview of Amazon Forecast

Technical requirements

What kinds of problems can we solve with forecasting?

What is Amazon Forecast?

How does Amazon Forecast work?

Choosing the right applications

Summary

Chapter 3: Creating a Project and Ingesting Your Data

Technical requirements

Understanding the components of a dataset group

Preparing a dataset for forecasting purposes

Creating an Amazon Forecast dataset group

Ingesting data in Amazon Forecast

Summary

Chapter 4: Training a Predictor with AutoML

Technical requirements

Using your datasets to train a predictor

How Amazon Forecast leverages automated machine learning

Understanding the predictor evaluation dashboard

Exporting and visualizing your predictor backtest results

Summary

Chapter 5: Customizing Your Predictor Training

Technical requirements

Choosing an algorithm and configuring the training parameters

Leveraging HPO

Reinforcing your backtesting strategy

Including holiday and weather data

Implementing featurization techniques

Customizing quantiles to suit your business needs

Summary

Chapter 6: Generating New Forecasts

Technical requirements

Generating a forecast

Using lookup to get your items forecast

Exporting and visualizing your forecasts

Generating explainability for your forecasts

Summary

Chapter 7: Improving and Scaling Your Forecast Strategy

Technical requirements

Deep diving into forecasting model metrics

Understanding your model accuracy

Model monitoring and drift detection

Serverless architecture orchestration

Summary

Section 2: Detecting Abnormal Behavior in Multivariate Time Series with Amazon Lookout for Equipment

Chapter 8: An Overview of Amazon Lookout for Equipment

Technical requirements

What is Amazon Lookout for Equipment?

What are the different approaches to tackle anomaly detection?

The challenges encountered with multivariate time series data

How does Amazon Lookout for Equipment work?

How do you choose the right applications?

Summary

Chapter 9: Creating a Dataset and Ingesting Your Data

Technical requirements

Preparing a dataset for anomaly detection purposes

Creating an Amazon Lookout for Equipment dataset

Generating a JSON schema

Creating a data ingestion job

Understanding common ingestion errors and workarounds

Summary

Chapter 10: Training and Evaluating a Model

Technical requirements

Using your dataset to train a model

Model organization best practices

Choosing a good data split between training and evaluation

Evaluating a trained model

Summary

Chapter 11: Scheduling Regular Inferences

Technical requirements

Using a trained model

Configuring a scheduler

Preparing a dataset for inference

Extracting the inference results

Summary

Chapter 12: Reducing Time to Insights for Anomaly Detections

Technical requirements

Improving your model's accuracy

Processing the model diagnostics

Monitoring your models

Orchestrating each step of the process with a serverless architecture

Summary

Section 3: Detecting Anomalies in Business Metrics with Amazon Lookout for Metrics

Chapter 13: An Overview of Amazon Lookout for Metrics

Technical requirements

Recognizing different types of anomalies

What is Amazon Lookout for Metrics?

How does Amazon Lookout for Metrics work?

Identifying suitable metrics for monitoring

Choosing between Lookout for Equipment and Lookout for Metrics

Summary

Chapter 14: Creating and Activating a Detector

Technical requirements

Preparing a dataset for anomaly detection purposes

Creating a detector

Adding a dataset and connecting a data source

Understanding the backtesting mode

Configuring alerts

Summary

Chapter 15: Viewing Anomalies and Providing Feedback

Technical requirements

Training a continuous detector

Reviewing anomalies from a trained detector

Interacting with a detector

Summary

Why subscribe?

Other Books You May Enjoy

Packt is searching for authors like you

Share Your Thoughts

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Typical time series use cases

Until this point, we have exposed many different considerations about the types, challenges, and analysis approaches you have to deal with when it comes to processing time series data. But what can we do with time series? What kind of insights can we derive from them? Recognizing the purpose of your analysis is a critical step in designing an appropriate approach for your data preparation activities or understanding how the insights derived from your analysis can be used by your end users. For instance, removing outliers from a time series can improve a forecasting analysis but makes any anomaly detection approach a moot point.

Typical use cases where time series datasets play an important—if not the most important role—can be any of these:

Forecasting
Anomaly detection
Event forewarning (anomaly prediction)
Virtual sensors
Activity detection (pattern analysis)
Predictive quality
Setpoint optimization

In the next three parts of this book, we are going to focus on forecasting (with Amazon Forecast), anomaly detection (with Amazon Lookout for Metrics), and multivariate event forewarning (using Amazon Lookout for Equipment to output detected anomalies that can then be analyzed over time to build anomaly forewarning notifications).

The remainder of this chapter will be dedicated to an overview of what else you can achieve using time series data. Although you can combine the AWS services exposed in this book to achieve part of what is necessary to solve these problems, a good rule of thumb is to consider that it won't be straightforward, and other approaches may yield a faster time to gain insights.

Virtual sensors

Also called soft sensors, this type of model is used to infer the calculation of a physical measurement with an ML model. Some harsh environments may not be suitable to install actual physical sensors. In other cases, there are no reliable physical sensors to measure the physical characteristics you are interested in (or a physical sensor does not exist altogether). Last but not least, sometimes you need a real-time measurement to manage your process but can only get one daily measure.

A virtual sensor uses a multivariate time series (all the other sensors available) to yield current or predicted values for the measurement you cannot get directly, at a useful granularity.

On AWS, you can do the following:

Build a custom model with the DeepAR built-in algorithm from Amazon SageMaker (https://docs.aws.amazon.com/sagemaker/latest/dg/deepar.html)
Leverage the DeepAR architecture or the DeepVAR one (a multivariate variant of DeepAR) from the GluonTS library (http://ts.gluon.ai), a library open sourced and maintained by AWS

Activity detection

When you have segments of univariate or multivariate time series, you may want to perform some pattern analysis to derive the exact actions that led to them. This can be useful to perform human activity recognition based on accelerometer data as captured by your phone (sports mobile applications automatically able to tell if you're cycling, walking, or running) or to understand your intent by analyzing brainwave data.

Many DL architectures can tackle this motif discovery task (long short-term memory (LSTM), CNN, or a combination of both): alternative approaches let you transform your time series into tabular data or images to apply clustering and classification techniques. All of these can be built as custom models on Amazon SageMaker or by using some of the built-in scalable algorithms available in this service, such as the following:

PCA: https://docs.aws.amazon.com/sagemaker/latest/dg/pca.html
XGBoost
Image classification

If you choose to transform your time series into images, you can also leverage Amazon Rekognition Custom Labels for classification or Amazon Lookout for Vision to perform anomaly detection.

Once you have an existing database of activities, you can also leverage an indexing approach: in this case, building a symbolic representation of your time series will allow you to use it as an embedding to query similar time series in a database, or even past segments of the same time series if you want to discover potential recurring motifs.

Predictive quality

Imagine that you have a process that ends up with a product or service with a quality that can vary depending on how well the process was executed. This is typically what can happen on a manufacturing production line where equipment sensors, process data, and other tabular characteristics can be measured and matched to the actual quality of the finished goods.

You can then use all these time series to build a predictive model that tries to predict if the current batch of products will achieve the appropriate quality grade or if it will have to be reworked or thrown away as waste.

Recurring neural networks (RNNs) are traditionally what is built to address this use case. Depending on how you shape your available dataset, you might however be able to use either Amazon Forecast (using the predicted quality or grade of the product as the main time series to predict and all the other available data as related time series) or Amazon Lookout for Equipment (by considering bad product quality as anomalies for which you want to get as much forewarning as possible).

Setpoint optimization

In process industries (industries that transform raw materials into finished goods such as shampoo, an aluminum coil, or a piece of furniture), setpoints are the target value of a process variable. Imagine that you need to keep the temperature of a fluid at 50°C; then, your setpoint is 50°C. The actual value measured by the process might be different and the objective of process control systems is to ensure that the process value reaches and stays at the desired setpoint.

In such a situation, you can leverage the time series data of your process to optimize an outcome: for instance, the quantity of waste generated, the energy or water used, or the change to generate a higher grade of product from a quality standpoint (see the previous predictive quality use case for more details). Based on the desired outcome, you can then use an ML approach to recommend setpoints that will ensure you reach your objectives.

Potential approaches to tackle this delicate optimization problem include the following:

Partial least squares (PLS) and Sparse PLS
DL-based model predictive control (MPC), which combines neural-network-based controllers and RNNs to replicate the dynamic behavior of an MPC.
RL with fault-tolerant control through quality learning (Q-learning)

The output expected for such models is the setpoint value for each parameter that controls the process. All these approaches can be built as custom models on Amazon SageMaker (which also provides an RL toolkit and environment in case you want to leverage Q-learning to solve this use case).

Time Series Analysis on AWS

By : Michaël Hoarau

Time Series Analysis on AWS

By: Michaël Hoarau

Overview of this book

Related Content you might be interested in

Current Title:

Time Series Analysis on AWS

Computer Vision on AWS

Applied Machine Learning for Healthcare and Life Sciences Using AWS

Modern Time Series Forecasting with Python

Typical time series use cases

Virtual sensors

Activity detection

Predictive quality

Setpoint optimization