Book Image

Time Series Analysis with Python Cookbook

By : Tarek A. Atwan
Book Image

Time Series Analysis with Python Cookbook

By: Tarek A. Atwan

Overview of this book

Time series data is everywhere, available at a high frequency and volume. It is complex and can contain noise, irregularities, and multiple patterns, making it crucial to be well-versed with the techniques covered in this book for data preparation, analysis, and forecasting. This book covers practical techniques for working with time series data, starting with ingesting time series data from various sources and formats, whether in private cloud storage, relational databases, non-relational databases, or specialized time series databases such as InfluxDB. Next, you’ll learn strategies for handling missing data, dealing with time zones and custom business days, and detecting anomalies using intuitive statistical methods, followed by more advanced unsupervised ML models. The book will also explore forecasting using classical statistical models such as Holt-Winters, SARIMA, and VAR. The recipes will present practical techniques for handling non-stationary data, using power transforms, ACF and PACF plots, and decomposing time series data with multiple seasonal patterns. Later, you’ll work with ML and DL models using TensorFlow and PyTorch. Finally, you’ll learn how to evaluate, compare, optimize models, and more using the recipes covered in the book.
Table of Contents (18 chapters)

To get the most out of this book

You should be comfortable coding in Python, with some familiarity with Matplotlib, NumPy, and pandas. The book covers a wide variety of libraries, and the first chapter will show you how to create different virtual environments for Python development. Working knowledge of the Python programming language will assist with understanding the key concepts covered in this book. It is recommended, but not required, to install either Anaconda, Miniconda, or Miniforge. Throughout the chapters, you will see instructions using either pip or Conda.

Alternatively, you can use Colab, and all you need is a browser.

Software/hardware covered in the book

Operating system requirements

Python 3.8/3.9+

Windows, macOS, or Linux

JupyterLab or the Jupyter Notebook

Windows, macOS, or Linux

In Chapter 3, Reading Time Series Data from Databases, and Chapter 5, Persisting Time Series Data to Databases, you will be working with different databases, including PostgreSQL, MySQL, InfluxDB, and MongoDB. If you do not have access to such databases, you can install them locally on your machine or use Docker and download the appropriate image using docker pull to download images from Docker Hub https://hub.docker.com – for example, docker pull influxdb to download InfluxDB. You can download Docker from the official page here: https://docs.docker.com/get-docker/.

Alternatively, you can explore hosted services such as Aiven https://aiven.io, which offers a 30-day trial and supports PostgreSQL, MySQL, and InfluxDB. For the recipes using AWS Redshift and Snowflake, you will need to have a subscription. You can subscribe to the AWS free tier here: https://aws.amazon.com/free. You can subscribe for a 30-day Snowflake trial here: https://signup.snowflake.com.

Similarly, in Chapter 2, Reading Time Series Data from Files, and Chapter 4, Persisting Time Series Data to Files, you will learn how to read and write data to AWS S3 buckets. This will require an AWS service subscription and should be covered under the free tier. For a list of all services covered under the free tier, you can visit the official page here: https://aws.amazon.com/free.

If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book's GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

To get the most value out of this book, it is important that you continue to experiment with the recipes further using different time series data. Throughout the recipes, you will see a recurring theme in which multiple time series datasets are used. This is done deliberately so that you can observe how the results vary on different data. You are encouraged to continue with that theme on your own.

If you are looking for additional datasets, in addition to those provided in the GitHub repository, you can check out some of the following links: