-
Book Overview & Buying
-
Table Of Contents
Time Series Analysis with Python Cookbook - Second Edition
By :
Time series analysis often begins with data preparation, a crucial and time-consuming stage. To effectively analyze time series, you first need to access the data, often stored in various file formats or database systems.
Time series data is complex and can appear in various shapes and formats. For instance, it may come as regular timestamped records, such as hourly temperature readings, or irregular events, such as transaction logs with varying time intervals. Data might also include multiple time series combined into a single dataset, such as sales and inventory levels for several stores.
In this chapter, we will use pandas, a popular Python library with a rich set of I/O tools, data wrangling, and date/time handling capabilities that streamline the process of working with time series data. You will explore several of the reader functions available in pandas to ingest data from different file types, including comma-separated values (CSV), Excel, and Parquet. Additionally, you will explore ingesting data from files stored locally or remotely on the cloud, such as an Amazon S3 bucket.
The pandas library provides two fundamental data structures for working with time series data: Series and DataFrame. A DataFrame is a distinct data structure for working with tabular data (think rows and columns in a spreadsheet). The main difference between the two data structures is that a Series is one-dimensional (single column), while a DataFrame is two-dimensional (multiple columns). The relationship between the two is that you get a Series when you slice out a column from a DataFrame. You can think of a DataFrame as a side-by-side concatenation of two or more Series objects.
A particular feature of the Series and DataFrames data structures is that they both have a labeled axis called an index. A specific type of index that you will often see with time series data is DatetimeIndex, which you will explore further in this chapter. Generally, the index makes slicing and dicing operations very intuitive. For example, to make a DataFrame ready for time series analysis, you will learn how to create DataFrames with an index of the DatetimeIndex type.
The recipes in this chapter will guide you through key techniques to load time series data into pandas DataFrames:
There’s an exciting GitHub-only bonus—Chapter 0: Getting Started with Time Series Analysis—available here: https://github.com/PacktPublishing/Time-Series-Analysis-with-Python-Cookbook-Second-Edition
Don’t miss the GitHub-exclusive recipe— Chapter 1-1: Working with large data files—available here: https://github.com/PacktPublishing/Time-Series-Analysis-with-Python-Cookbook-Second-Edition
Why DateTimeIndex?
A pandas DataFrame with an index of the DatetimeIndex type unlocks a large set of features and useful functions needed when working with time series data. You can think of it as adding a layer of intelligence or awareness to pandas to treat the DataFrame as a time series DataFrame.