Book Image

Machine Learning for Time-Series with Python - Second Edition

By : Ben Auffarth
4 (4)
Book Image

Machine Learning for Time-Series with Python - Second Edition

4 (4)
By: Ben Auffarth

Overview of this book

The Python time-series ecosystem is a huge and challenging topic to tackle, especially for time series since there are so many new libraries and models. Machine Learning for Time Series, Second Edition, aims to deepen your understanding of time series by providing a comprehensive overview of popular Python time-series packages and helping you build better predictive systems. This fully updated second edition starts by re-introducing the basics of time series and then helps you get to grips with traditional autoregressive models as well as modern non-parametric models. By observing practical examples and the theory behind them, you will gain a deeper understanding of loading time-series datasets from any source and a variety of models, such as deep learning recurrent neural networks, causal convolutional network models, and gradient boosting with feature engineering. This book will also help you choose the right model for the right problem by explaining the theory behind several useful models. New updates include a chapter on forecasting and extracting signals on financial markets and case studies with relevant examples from operations management, digital marketing, and healthcare. By the end of this book, you should feel at home with effectively analyzing and applying machine learning methods to time series.
Table of Contents (3 chapters)

What is a time series?

A time series is a sequence of data points, typically measured at successive times, spaced at uniform time intervals. Time Series mostly come as discrete-time, where the time difference between each point is the same.Many disciplines, such as finance, public administration, energy, retail, and healthcare, are dominated by time series data. Large areas of micro- and macroeconomics rely on applied statistics with an emphasis on time series analyses and modeling.The following are examples of time series data:

  • Daily closing values of a stock index
  • Number of weekly infections of a disease
  • Weekly series of train accidents
  • Rainfall per day
  • Sensor data such as temperature measurements per hour
  • Population growth per year
  • Quarterly earnings of a company over a number of years
  • The number of views or visitors to a website over time

This is only to name but a few. Loosely speaking, any data that deals with changes over time is a time series. Since this is a book about time series data, it’s worth defining more formally what is considered a time series.Definition: Time Series are datasets where observations are arranged in chronological order.This is a very broad definition. Alternatively, we could have said that a time series is a sequence of data points taken sequentially over time, or that a time series is the result of a stochastic process.Formally, we can define a time series in two ways. The first one is as a mapping from the time domain to the domain of real numbers:

Where and .Another way to define a time series is as a stochastic process:

Here, X(t) or Xt denotes the value of the random variable X at time point t.If T is a set of real numbers, it's a continuous-time stochastic process. If T is a set of integers, we call it a stochastic process in discrete time. The convention in the latter case is to write {Xn}.Since time is the primary index of the dataset, by implication, time series datasets describe how the world changes over time. They often deal with the question of how the past influences the presence or future.The increase of monitoring and data collection brings with it the need for both statistical and machine learning techniques applied to time series to predict and characterize the behavior of complex systems or components within a system. An important part of working with time series is the question of how the future can be predicted based on the past. This is called forecasting.Some methods allow adding business cycles or other descriptors as additional features. These additional features are called exogenous features - they are time-dependent, explanatory variables. We'll go through examples of feature generation in Chapter 4, Machine Learning for Time Series.

Characteristics of time series

There are many different types of time series data, such as financial data, economic data, weather data, and social media data. Each type of data has its own challenges and requires its own methods for time series analysis and forecasting.Financial data is often non-stationary, meaning that the statistical properties of the data change over time. This makes it difficult to build models that can accurately predict future values. Economic data is often seasonal, meaning that there is a regular pattern of ups and downs over the course of a year. This can be difficult to model, as the seasonality may be caused by factors that are not easily observed or measured. Weather data is often chaotic, meaning that it is very difficult to predict even short-term future values. Social media data is often volatile, meaning that it can change rapidly and unexpectedly. This makes it difficult to build models that can accurately predict future values.One of the most common examples of time series data is stock prices. Stock prices are constantly changing, and they can be used to describe how a company is doing over time. Another example of time series data is weather data. Weather data can be used to describe changes in the weather over time, and it can be used to predict future weather patterns. A time series can be univariate, where there's one value against time, for example temperatures over time in London, United Kingdom. Such a time series can be represented using a line chart, where the x-axis is the time and the y-axis is the value of the time series. Time series can be multivariate as well. In this case, there are more than one variables that vary with time.Here's an extract of a time series dataset as an example, exported from Google Trends, on searches for Python, R, and Julia:

Figure 1.1: Extract of a time series dataset

This is a multivariate time series, with columns for Python, R, and Julia. The first column is the index, a date column, and its period is the month. In cases, where we have only a single variable, we speak of a univariate series. This dataset would be univariate if we had only one programming language instead of three.For many applications, it is useful to split a time series into its component parts, such as trend, seasonality, and noise.A trend is the general direction in which something is developing or changing, such as a long-term increase or decrease in a sequence. An example of where a trend can be observed would be global warming, the process by which the temperatures on our planet have been rising over the last half-century.Here's a plot of global surface temperature changes over the last 100 years from the GISS Surface Temperature Analysis dataset released by NASA:

Figure 1.2: GISS surface temperature analysis from 1880 to 2019

As you can see in Figure 1.2, temperature changes have been varying around 0 until the mid-20th century; however, since then, there's been a clearly visible trend of an overall rise in the yearly temperature.Seasonality is a variation that occurs at specific regular intervals of less than a year. Seasonality can occur on different time spans, such as daily, weekly, monthly, or yearly. An example of weekly seasonality would be sales of ice cream picking up each weekend. Also, depending on where you live, ice cream might only be sold in spring and summer. This is a yearly variation.Other than seasonal changes and trends, there is variability that's not of a fixed frequency or that rises and falls in a way that's not based on seasonal frequency. Some of these we might be able to explain based on the knowledge we have.As an example of cyclic variability that's irregular, bank holidays can fall on different calendar days each year, and promotional campaigns could depend on business decisions, such as the introduction of a new product.In electroencephalography (EEG), the electrical activity of the brain is recorded through electrodes placed on the scalp. Its signal typically shows strong oscillations (also referred to as brain waves) at a variety of frequency ranges. Here's a graph of an EEG signal (from the EEG Eye State dataset uploaded by Oliver Roesler from DHBW, Germany):

Figure 1.7: EEG signal

Such changes at the scale of seconds or milliseconds would not be called seasonal effects. Other changes that take place over time periods longer than a year would not be called seasonal either.The task of identifying, quantifying, and decomposing these and other characteristics is called time series analysis. Exploratory time series analysis is often the first step before any feature transformation and machine learning.