Book Image

Time Series Analysis on AWS

By : Michaël Hoarau
Book Image

Time Series Analysis on AWS

By: Michaël Hoarau

Overview of this book

Being a business analyst and data scientist, you have to use many algorithms and approaches to prepare, process, and build ML-based applications by leveraging time series data, but you face common problems, such as not knowing which algorithm to choose or how to combine and interpret them. Amazon Web Services (AWS) provides numerous services to help you build applications fueled by artificial intelligence (AI) capabilities. This book helps you get to grips with three AWS AI/ML-managed services to enable you to deliver your desired business outcomes. The book begins with Amazon Forecast, where you’ll discover how to use time series forecasting, leveraging sophisticated statistical and machine learning algorithms to deliver business outcomes accurately. You’ll then learn to use Amazon Lookout for Equipment to build multivariate time series anomaly detection models geared toward industrial equipment and understand how it provides valuable insights to reinforce teams focused on predictive maintenance and predictive quality use cases. In the last chapters, you’ll explore Amazon Lookout for Metrics, and automatically detect and diagnose outliers in your business and operational data. By the end of this AWS book, you’ll have understood how to use the three AWS AI services effectively to perform time series analysis.
Table of Contents (20 chapters)
1
Section 1: Analyzing Time Series and Delivering Highly Accurate Forecasts with Amazon Forecast
9
Section 2: Detecting Abnormal Behavior in Multivariate Time Series with Amazon Lookout for Equipment
15
Section 3: Detecting Anomalies in Business Metrics with Amazon Lookout for Metrics

Typical time series use cases

Until this point, we have exposed many different considerations about the types, challenges, and analysis approaches you have to deal with when it comes to processing time series data. But what can we do with time series? What kind of insights can we derive from them? Recognizing the purpose of your analysis is a critical step in designing an appropriate approach for your data preparation activities or understanding how the insights derived from your analysis can be used by your end users. For instance, removing outliers from a time series can improve a forecasting analysis but makes any anomaly detection approach a moot point.

Typical use cases where time series datasets play an important—if not the most important role—can be any of these:

  • Forecasting
  • Anomaly detection
  • Event forewarning (anomaly prediction)
  • Virtual sensors
  • Activity detection (pattern analysis)
  • Predictive quality
  • Setpoint optimization

In the next three parts of this book, we are going to focus on forecasting (with Amazon Forecast), anomaly detection (with Amazon Lookout for Metrics), and multivariate event forewarning (using Amazon Lookout for Equipment to output detected anomalies that can then be analyzed over time to build anomaly forewarning notifications).

The remainder of this chapter will be dedicated to an overview of what else you can achieve using time series data. Although you can combine the AWS services exposed in this book to achieve part of what is necessary to solve these problems, a good rule of thumb is to consider that it won't be straightforward, and other approaches may yield a faster time to gain insights.

Virtual sensors

Also called soft sensors, this type of model is used to infer the calculation of a physical measurement with an ML model. Some harsh environments may not be suitable to install actual physical sensors. In other cases, there are no reliable physical sensors to measure the physical characteristics you are interested in (or a physical sensor does not exist altogether). Last but not least, sometimes you need a real-time measurement to manage your process but can only get one daily measure.

A virtual sensor uses a multivariate time series (all the other sensors available) to yield current or predicted values for the measurement you cannot get directly, at a useful granularity.

On AWS, you can do the following:

Activity detection

When you have segments of univariate or multivariate time series, you may want to perform some pattern analysis to derive the exact actions that led to them. This can be useful to perform human activity recognition based on accelerometer data as captured by your phone (sports mobile applications automatically able to tell if you're cycling, walking, or running) or to understand your intent by analyzing brainwave data.

Many DL architectures can tackle this motif discovery task (long short-term memory (LSTM), CNN, or a combination of both): alternative approaches let you transform your time series into tabular data or images to apply clustering and classification techniques. All of these can be built as custom models on Amazon SageMaker or by using some of the built-in scalable algorithms available in this service, such as the following:

If you choose to transform your time series into images, you can also leverage Amazon Rekognition Custom Labels for classification or Amazon Lookout for Vision to perform anomaly detection.

Once you have an existing database of activities, you can also leverage an indexing approach: in this case, building a symbolic representation of your time series will allow you to use it as an embedding to query similar time series in a database, or even past segments of the same time series if you want to discover potential recurring motifs.

Predictive quality

Imagine that you have a process that ends up with a product or service with a quality that can vary depending on how well the process was executed. This is typically what can happen on a manufacturing production line where equipment sensors, process data, and other tabular characteristics can be measured and matched to the actual quality of the finished goods.

You can then use all these time series to build a predictive model that tries to predict if the current batch of products will achieve the appropriate quality grade or if it will have to be reworked or thrown away as waste.

Recurring neural networks (RNNs) are traditionally what is built to address this use case. Depending on how you shape your available dataset, you might however be able to use either Amazon Forecast (using the predicted quality or grade of the product as the main time series to predict and all the other available data as related time series) or Amazon Lookout for Equipment (by considering bad product quality as anomalies for which you want to get as much forewarning as possible).

Setpoint optimization

In process industries (industries that transform raw materials into finished goods such as shampoo, an aluminum coil, or a piece of furniture), setpoints are the target value of a process variable. Imagine that you need to keep the temperature of a fluid at 50°C; then, your setpoint is 50°C. The actual value measured by the process might be different and the objective of process control systems is to ensure that the process value reaches and stays at the desired setpoint.

In such a situation, you can leverage the time series data of your process to optimize an outcome: for instance, the quantity of waste generated, the energy or water used, or the change to generate a higher grade of product from a quality standpoint (see the previous predictive quality use case for more details). Based on the desired outcome, you can then use an ML approach to recommend setpoints that will ensure you reach your objectives.

Potential approaches to tackle this delicate optimization problem include the following:

  • Partial least squares (PLS) and Sparse PLS
  • DL-based model predictive control (MPC), which combines neural-network-based controllers and RNNs to replicate the dynamic behavior of an MPC.
  • RL with fault-tolerant control through quality learning (Q-learning)

The output expected for such models is the setpoint value for each parameter that controls the process. All these approaches can be built as custom models on Amazon SageMaker (which also provides an RL toolkit and environment in case you want to leverage Q-learning to solve this use case).