Book Image

Time Series Analysis on AWS

By : Michaël Hoarau
Book Image

Time Series Analysis on AWS

By: Michaël Hoarau

Overview of this book

Being a business analyst and data scientist, you have to use many algorithms and approaches to prepare, process, and build ML-based applications by leveraging time series data, but you face common problems, such as not knowing which algorithm to choose or how to combine and interpret them. Amazon Web Services (AWS) provides numerous services to help you build applications fueled by artificial intelligence (AI) capabilities. This book helps you get to grips with three AWS AI/ML-managed services to enable you to deliver your desired business outcomes. The book begins with Amazon Forecast, where you’ll discover how to use time series forecasting, leveraging sophisticated statistical and machine learning algorithms to deliver business outcomes accurately. You’ll then learn to use Amazon Lookout for Equipment to build multivariate time series anomaly detection models geared toward industrial equipment and understand how it provides valuable insights to reinforce teams focused on predictive maintenance and predictive quality use cases. In the last chapters, you’ll explore Amazon Lookout for Metrics, and automatically detect and diagnose outliers in your business and operational data. By the end of this AWS book, you’ll have understood how to use the three AWS AI services effectively to perform time series analysis.
Table of Contents (20 chapters)
1
Section 1: Analyzing Time Series and Delivering Highly Accurate Forecasts with Amazon Forecast
9
Section 2: Detecting Abnormal Behavior in Multivariate Time Series with Amazon Lookout for Equipment
15
Section 3: Detecting Anomalies in Business Metrics with Amazon Lookout for Metrics

What is a time series dataset?

From a mathematical point of view, a time series is a discrete sequence of data points that are listed chronologically. Let's take each of the highlighted terms of this definition, as follows:

  • Sequence: A time series is a sequence of data points. Each data point can take a single value (usually a binary, an integer, or a real value).
  • Discrete: Although the phenomena measured by a system can be continuous (the outside temperature is a continuous variable as time can, of course, range over the entire real number line), time series data is generated by systems that capture data points at a certain interval (a temperature sensor can, for instance, take a new temperature reading every 5 minutes). The measurement interval is usually regular (the data points will be equally spaced in time), but more often than not, they are irregular.
  • Chronologically: The sequence of data points is naturally time-ordered. If you were to perform an analysis of your time series data, you will generally observe that data points that are close together in time will be more closely related than observations that are further apart. This natural ordering means that any time series model usually tries to learn from past values of a given sequence rather than from future values (this is obvious when you try to forecast future values of a time series sequence but is still very important for other use cases).

Machine learning (ML) and artificial intelligence (AI) have largely been successful at leveraging new technologies including dedicated processing units (such as graphics processing units, or GPUs), fast network channels, and immense storage capacities. Paradigm shifts such as cloud computing have played a big role in democratizing access to such technologies, allowing tremendous leaps in novel architectures that are immediately leveraged in production workloads.

In this landscape, time series appear to have benefited a lot less from these advances: what makes time series data so specific? We might consider a time series dataset to be a mere tabular dataset that would be time-indexed: this would be a big mistake. Introducing time as a variable in your dataset means that you are now dealing with the notion of temporality: patterns can now emerge based on the timestamp of each row of your dataset. Let's now have a look at the different families of time series you may encounter.