Book Image

Hands-On Time Series Analysis with R

By : Rami Krispin
Book Image

Hands-On Time Series Analysis with R

By: Rami Krispin

Overview of this book

Time-series analysis is the art of extracting meaningful insights from, and revealing patterns in, time-series data using statistical and data visualization approaches. These insights and patterns can then be utilized to explore past events and forecast future values in the series. This book explores the basics of time-series analysis with R and lays the foundation you need to build forecasting models. You will learn how to preprocess raw time-series data and clean and manipulate data with packages such as stats, lubridate, xts, and zoo. You will analyze data using both descriptive statistics and rich data visualization tools in R including the TSstudio, plotly, and ggplot2 packages. The book then delves into traditional forecasting models such as time-series linear regression, exponential smoothing (Holt, Holt-Winter, and more) and Auto-Regressive Integrated Moving Average (ARIMA) models with the stats and forecast packages. You'll also work on advanced time-series regression models with machine learning algorithms such as random forest and Gradient Boosting Machine using the h2o package. By the end of this book, you will have developed the skills necessary for exploring your data, identifying patterns, and building a forecasting model using various traditional and machine learning methods.
Table of Contents (14 chapters)

Time series analysis

Time series analysis is the process of extracting meaningful insights from time series data with the use of data visualization tools, statistical applications, and mathematical models. Those insights can be used to learn and explore past events and to forecast future events. The analysis process can be divided into the following steps:

  1. Data collection: This step includes extracting data from different data sources, such as flat files (such as CSV, TXT, and XLMS), databases (for example, SQL Server, and Teradata), or other internet sources (such as academic resources and the Bureau of Statistics datasets). Later on in this chapter, we will learn how to load data to R from different sources.
  2. Data preparation: In most cases, raw data is unstructured and may require cleaning, transformation, aggregation, and reformatting. In Chapter 2, Working with Date and Time Objects; Chapter 3, The Time Series Object; and Chapter 4, Working with zoo and xts Objects, we will focus on the core data preparation methods of time series data with R.
  3. Descriptive analysis: This is used in summary statistics and data visualization tools to extract insights from the data, such as patterns, distributions, cycles, and relationships with other drivers to learn more about past events. In Chapter 5, Decomposition of Time Series Data; Chapter 6, Seasonality Analysis; and Chapter 7, Correlation Analysis, we will focus on descriptive analysis methods of time series data.
  4. Predictive analysis: We use this to apply statistical methods in order to forecast future events. Chapter 8, Forecasting Strategies; Chapter 9, Forecasting with Linear Regression; Chapter 10, Forecasting with Exponential Smoothing Models; Chapter 11, Forecasting with ARIMA Models; and Chapter 12, Forecasting with Machine Learning Models, we will focus on traditional forecasting approaches (such as linear regression, exponential smoothing, and ARIMA models), as well as advanced forecasting approaches with machine learning models.

It may be surprising but, in reality, the first two steps may take most of the process time and effort, which is mainly due to data challenges and complexity. For instance, companies tend to restructure their business units (BU) and IT systems every couple of years, and therefore it is hard to identify and track the historical contribution (production, revenues, unit sales, and so on) of a specific BU before the changes.

In other cases, additional effort is required to clean the raw data and handle missing values and outliers. This sadly leaves less time for the analysis itself. Fortunately, R has a variety of wonderful applications for data preparations, visualizations, and time series modeling. This helps to reduce the time that's spent on the preparation steps and lets you allocate more time to the analysis itself. Throughout the rest of this chapter, we will provide background information on R and its applications for time series analysis.

Learning with real-life examples

Throughout the learning journey in this book, we will use real-life examples of time series data in order to apply the methods and techniques of the analysis. All of the datasets that we will use are available in the TSstudio and UKgrid packages (unless stated otherwise).

The first time series data we will look at is the monthly natural gas consumption in the US. This data is collected by the US Energy Information Administration (EIA) and measures the monthly natural gas consumption from January 2000 until November 2018. The unit of measurement is billions of cubic feet (not seasonally adjusted). The following graph shows the monthly natural gas consumption in the US:

The following series describe the total vehicle sales in the US from January 1976 until January 2019. The units of this series are in thousands of units (not seasonally adjusted). The data is sourced from the US Bureau of Economic Analysis. The following graph shows the total monthly vehicle sales in the US:

Another monthly series that we will use is the monthly US unemployment rate, which represents the number of unemployed as a percentage of the labor force. The series started in January 1948 and ended in January 2019. The data is sourced from the US Bureau of Labor Statistics. The following graph shows the monthly unemployment rate in the US:

Last but not the least, we will use the national demand for electricity in the UK (as measured on the grid systems) between 2011 and 2018, since it provides an example of high-frequency time series data with half-hourly intervals. The data source is the UK National Grid website, and the information is shown in the following graph:

Let's start by installing R.