Book Image

Time Series Analysis with Python Cookbook

By : Tarek A. Atwan
Book Image

Time Series Analysis with Python Cookbook

By: Tarek A. Atwan

Overview of this book

Time series data is everywhere, available at a high frequency and volume. It is complex and can contain noise, irregularities, and multiple patterns, making it crucial to be well-versed with the techniques covered in this book for data preparation, analysis, and forecasting. This book covers practical techniques for working with time series data, starting with ingesting time series data from various sources and formats, whether in private cloud storage, relational databases, non-relational databases, or specialized time series databases such as InfluxDB. Next, you’ll learn strategies for handling missing data, dealing with time zones and custom business days, and detecting anomalies using intuitive statistical methods, followed by more advanced unsupervised ML models. The book will also explore forecasting using classical statistical models such as Holt-Winters, SARIMA, and VAR. The recipes will present practical techniques for handling non-stationary data, using power transforms, ACF and PACF plots, and decomposing time series data with multiple seasonal patterns. Later, you’ll work with ML and DL models using TensorFlow and PyTorch. Finally, you’ll learn how to evaluate, compare, optimize models, and more using the recipes covered in the book.
Table of Contents (18 chapters)

Reading Time Series Data from Files

In this chapter, we will use pandas, a popular Python library with a rich set of I/O tools, data wrangling, and date/time functionality to streamline working with time series data. In addition, you will explore several reader functions available in pandas to ingest data from different file types, such as Comma-Separated Value (CSV), Excel, and SAS. You will explore reading from files, whether they are stored locally on your drive or remotely on the cloud, such as an AWS S3 bucket.

Time series data is complex and can be in different shapes and formats. Conveniently, the pandas reader functions offer a vast number of arguments (parameters) to help handle such variety in the data.

The pandas library provides two fundamental data structures, Series and DataFrame, implemented as classes. The DataFrame class is a distinct data structure for working with tabular data (think rows and columns in a spreadsheet). The main difference between the two data structures is that a Series is one-dimensional (single column), and a DataFrame is two-dimensional (multiple columns). The relationship between the two is that you get a Series when you slice out a column from a DataFrame. You can think of a DataFrame as a side-by-side concatenation of two or more Series objects.

A particular feature of the Series and DataFrames data structures is that they both have a labeled axis called index. A specific type of index that you will often see with time series data is the DatetimeIndex which you will explore further in this chapter. Generally, the index makes slicing and dicing operations very intuitive. For example, to make a DataFrame ready for time series analysis, you will learn how to create DataFrames with an index of type DatetimeIndex.

We will cover the following recipes on how to ingest data into a pandas DataFrame:

  • Reading data from CSVs and other delimited files
  • Reading data from an Excel file
  • Reading data from URLs
  • Reading data from a SAS dataset

    Why DatetimeIndex?

    A pandas DataFrame with an index of type DatetimeIndex unlocks a large set of features and useful functions needed when working with time series data. You can think of it as adding a layer of intelligence or awareness to pandas to treat the DataFrame as a time series DataFrame.