Book Image

Time Series Analysis with Python Cookbook

By : Tarek A. Atwan
Book Image

Time Series Analysis with Python Cookbook

By: Tarek A. Atwan

Overview of this book

Time series data is everywhere, available at a high frequency and volume. It is complex and can contain noise, irregularities, and multiple patterns, making it crucial to be well-versed with the techniques covered in this book for data preparation, analysis, and forecasting. This book covers practical techniques for working with time series data, starting with ingesting time series data from various sources and formats, whether in private cloud storage, relational databases, non-relational databases, or specialized time series databases such as InfluxDB. Next, you’ll learn strategies for handling missing data, dealing with time zones and custom business days, and detecting anomalies using intuitive statistical methods, followed by more advanced unsupervised ML models. The book will also explore forecasting using classical statistical models such as Holt-Winters, SARIMA, and VAR. The recipes will present practical techniques for handling non-stationary data, using power transforms, ACF and PACF plots, and decomposing time series data with multiple seasonal patterns. Later, you’ll work with ML and DL models using TensorFlow and PyTorch. Finally, you’ll learn how to evaluate, compare, optimize models, and more using the recipes covered in the book.
Table of Contents (18 chapters)

Performing data quality checks

Missing data are values not captured or observed in the dataset. Values can be missing for a particular feature (column), or an entire observation (row). When ingesting the data using pandas, missing values will show up as either NaN, NaT, or NA.

Sometimes, missing observations are replaced with other values in the source system; for example, this can be a numeric filler such as 99999 or 0, or a string such as missing or N/A. When missing values are represented by 0, you need to be cautious and investigate further to determine whether those zero values are legitimate or they are indicative of missing data.

In this recipe, you will explore how to identify the presence of missing data.

Getting ready

You can download the Jupyter notebooks and requisite datasets from the GitHub repository. Please refer to the Technical requirements section of this chapter.

You will be using two datasets from the Ch7 folder: clicks_missing_multiple.csv and...