Book Image

Time Series Indexing

By : Mihalis Tsoukalos
Book Image

Time Series Indexing

By: Mihalis Tsoukalos

Overview of this book

Time series are everywhere, ranging from financial data and system metrics to weather stations and medical records. Being able to access, search, and compare time series data quickly is essential, and this comprehensive guide enables you to do just that by helping you explore SAX representation and the most effective time series index, iSAX. The book begins by teaching you about the implementation of SAX representation in Python as well as the iSAX index, along with the required theory sourced from academic research papers. The chapters are filled with figures and plots to help you follow the presented topics and understand key concepts easily. But what makes this book really great is that it contains the right amount of knowledge about time series indexing using the right amount of theory and practice so that you can work with time series and develop time series indexes successfully. Additionally, the presented code can be easily ported to any other modern programming language, such as Swift, Java, C, C++, Ruby, Kotlin, Go, Rust, and JavaScript. By the end of this book, you'll have learned how to harness the power of iSAX and SAX representation to efficiently index and analyze time series data and will be equipped to develop your own time series indexes and effectively work with time series data.
Table of Contents (11 chapters)

The tsfresh Python package

This is a bonus section not directly related to the subject of the book, but it is helpful, nonetheless. It is about a handy Python package called tsfresh, which can give you a good overview of your time series from a statistical perspective. We are not going to present all the capabilities of tsfresh, just the ones that you can easily use to get information about your time series data – at this point, you might need to install tsfresh on your machine. Keep in mind that the tsfresh package has lots of package dependencies.

So, we are going to compute the following properties of a dataset – in this case, a time series:

  • Mean value: The mean value of a dataset is the summary of all the values divided by the number of values.
  • Standard deviation: The standard deviation of a dataset measures the amount of variation in it. There is a formula to calculate the standard deviation, but we usually compute it using a function from a Python package.
  • Skewness: The skewness of a dataset is a measure of the asymmetry in it. The value of skewness can be positive, negative, zero, or undefined.
  • Kurtosis: The kurtosis of a dataset is a measure of the tailedness of a dataset. In more mathematical terms, kurtosis measures the heaviness of the tail of a distribution compared to a normal distribution.

All these quantities will make much more sense once you plot your data, which is left as an exercise for you; otherwise, they will be just numbers. So, now that we know some basic statistic terms, let us present a Python script that calculates all these quantities for a time series.

The Python code for using_tsfresh.py is as follows:

#!/usr/bin/env python3
import sys
import pandas as pd
import tsfresh
def main():
     if len(sys.argv) != 2:
           print("TS")
           sys.exit()
     TS1 = sys.argv[1]
     ts1Temp = pd.read_csv(TS1, compression='gzip')
     ta = ts1Temp.to_numpy()
     ta = ta.reshape(len(ta))
     # Mean value
     meanValue = tsfresh.feature_extraction.feature_calculators.mean(ta)
     print("Mean value:\t\t", meanValue)
     # Standard deviation
     stdDev = tsfresh.feature_extraction.feature_calculators.standard_deviation(ta)
     print("Standard deviation:\t", stdDev)
     # Skewness
     skewness = tsfresh.feature_extraction.feature_calculators.skewness(ta)
     print("Skewness:\t\t", skewness)
     # Kurtosis
     kurtosis = tsfresh.feature_extraction.feature_calculators.kurtosis(ta)
     print("Kurtosis:\t\t", kurtosis)
if __name__ == '__main__':
     main()

The output of using_tsfresh.py when processing ts1.gz should look similar to the following:

$ ./using_tsfresh.py ts1.gz
Mean value:  15.706410001204729
Standard deviation:  8.325017802111901
Skewness:     0.008971113265160474
Kurtosis:    -1.2750042973761417

The tsfresh package can do many more things; we have just presented the tip of the iceberg of the capabilities of tsfresh.

The next section is about creating a histogram of a time series.