Book Image

Time Series Analysis with Python Cookbook

By : Tarek A. Atwan
Book Image

Time Series Analysis with Python Cookbook

By: Tarek A. Atwan

Overview of this book

Time series data is everywhere, available at a high frequency and volume. It is complex and can contain noise, irregularities, and multiple patterns, making it crucial to be well-versed with the techniques covered in this book for data preparation, analysis, and forecasting. This book covers practical techniques for working with time series data, starting with ingesting time series data from various sources and formats, whether in private cloud storage, relational databases, non-relational databases, or specialized time series databases such as InfluxDB. Next, you’ll learn strategies for handling missing data, dealing with time zones and custom business days, and detecting anomalies using intuitive statistical methods, followed by more advanced unsupervised ML models. The book will also explore forecasting using classical statistical models such as Holt-Winters, SARIMA, and VAR. The recipes will present practical techniques for handling non-stationary data, using power transforms, ACF and PACF plots, and decomposing time series data with multiple seasonal patterns. Later, you’ll work with ML and DL models using TensorFlow and PyTorch. Finally, you’ll learn how to evaluate, compare, optimize models, and more using the recipes covered in the book.
Table of Contents (18 chapters)

Understanding missing data

Data can be missing for a variety of reasons, such as unexpected power outages, a device that got accidentally unplugged, a sensor that just became defective, a survey respondent declined to answer a question, or the data was intentionally removed for privacy and compliance reasons. In other words, missing data is inevitable.

Generally, missing data is very common, yet sometimes it is not given the proper level of attention in terms of formulating a strategy on how to handle the situation. One approach for handling rows with missing data is to drop those observations (delete the rows). However, this may not be a good strategy if you have limited data in the first place, for example, if collecting the data is a complex and expensive process. Additionally, the drawback of deleting records, if done prematurely, is that you will not know if the missing data was due to censoring (an observation is only partially collected) or due to bias (for example, high...