Book Image

Forecasting Time Series Data with Prophet - Second Edition

By : Greg Rafferty
5 (1)
Book Image

Forecasting Time Series Data with Prophet - Second Edition

5 (1)
By: Greg Rafferty

Overview of this book

Forecasting Time Series Data with Prophet will help you to implement Prophet's cutting-edge forecasting techniques to model future data with high accuracy using only a few lines of code. This second edition has been fully revised with every update to the Prophet package since the first edition was published two years ago. An entirely new chapter is also included, diving into the mathematical equations behind Prophet's models. Additionally, the book contains new sections on forecasting during shocks such as COVID, creating custom trend modes from scratch, and a discussion of recent developments in the open-source forecasting community. You'll cover advanced features such as visualizing forecasts, adding holidays and trend changepoints, and handling outliers. You'll use the Fourier series to model seasonality, learn how to choose between an additive and multiplicative model, and understand when to modify each model parameter. Later, you'll see how to optimize more complicated models with hyperparameter tuning and by adding additional regressors to the model. Finally, you'll learn how to run diagnostics to evaluate the performance of your models in production. By the end of this book, you'll be able to take a raw time series dataset and build advanced and accurate forecasting models with concise, understandable, and repeatable code.
Table of Contents (20 chapters)
1
Part 1: Getting Started with Prophet
5
Part 2: Seasonality, Tuning, and Advanced Features
14
Part 3: Diagnostics and Evaluation

ARIMA

In 1970, the mathematicians George Box and Gwilym Jenkins published Time Series: Forecasting and Control, which described what is now known as the Box-Jenkins model. This methodology took the idea of the MA further with the development of ARIMA. As a term, ARIMA is often used interchangeably with Box-Jenkins, although technically, Box-Jenkins refers to a method of parameter optimization for an ARIMA model.

ARIMA is an acronym that refers to three concepts: Autoregressive (AR), Integrated (I), and MA. We already understand the MA part. AR means that the model uses the dependent relationship between a data point and a certain number of lagged data points. That is, the model predicts upcoming values based on previous values. This is similar to predicting that it will be warm tomorrow because it’s been warm all week so far.

The integrated part means that instead of using any raw data point, the difference between that data point and a previous data point is used. Essentially, this means that we convert a series of values into a series of changes in values. Intuitively, this suggests that tomorrow will be more or less the same temperature as today because the temperature all week hasn’t varied too much.

Each of the AR, I, and MA components of an ARIMA model are explicitly specified as a parameter in the model. Traditionally, p is used for the number of lag observations to use, also known as the lag order. The number of times that a raw observation is differenced, or the degree of differencing, is known as d, and q represents the size of the MA window. Thus arises the standard notation for an ARIMA model of ARIMA(p, d, q), where p, d, and q are all non-negative integers.

A problem with ARIMA models is that they do not support seasonality, or data with repeating cycles, such as temperature rising in the day and falling at night or rising in summer and falling in winter. Seasonal ARIMA (SARIMA) was developed to overcome this drawback. Similar to the ARIMA notation, the notation for a SARIMA model is SARIMA(p, d, q)(P, D, Q)m, with P being the seasonal AR order, D the seasonal difference order, Q the seasonal MA order, and m the number of time steps for a single seasonal period.

You may also come across other variations of ARIMA models, including Vector ARIMA (VARIMA) for cases with multiple time series as vectors; Fractional ARIMA (FARIMA) or Autoregressive Fractionally Integrated Moving Average

PD: Style as P-Keyword (ARFIMA), both of which include a fractional differencing degree, allowing for long memory in the sense that observations far apart in time can have non-negligible dependencies; and SARIMAX, a seasonal ARIMA model where the X stands for exogenous or additional variables added to the model, such as adding a rain forecast to a temperature model.

ARIMA does typically exhibit very good results, but the downside is its complexity. Tuning and optimizing ARIMA models is often computationally expensive and successful results can depend upon the skill and experience of the forecaster. It is not a scalable process, but better suited to ad hoc analyses by skilled practitioners.