Book Image

Time Series Indexing

By : Mihalis Tsoukalos
Book Image

Time Series Indexing

By: Mihalis Tsoukalos

Overview of this book

Time series are everywhere, ranging from financial data and system metrics to weather stations and medical records. Being able to access, search, and compare time series data quickly is essential, and this comprehensive guide enables you to do just that by helping you explore SAX representation and the most effective time series index, iSAX. The book begins by teaching you about the implementation of SAX representation in Python as well as the iSAX index, along with the required theory sourced from academic research papers. The chapters are filled with figures and plots to help you follow the presented topics and understand key concepts easily. But what makes this book really great is that it contains the right amount of knowledge about time series indexing using the right amount of theory and practice so that you can work with time series and develop time series indexes successfully. Additionally, the presented code can be easily ported to any other modern programming language, such as Swift, Java, C, C++, Ruby, Kotlin, Go, Rust, and JavaScript. By the end of this book, you'll have learned how to harness the power of iSAX and SAX representation to efficiently index and analyze time series data and will be equipped to develop your own time series indexes and effectively work with time series data.
Table of Contents (11 chapters)

Developing a Python package

In this section, we describe the process of developing a Python package that calculates the SAX representation of a subsequence. Apart from this being a good programming exercise, the package is going to be enriched in the chapters that follow when we create the iSAX index.

We will begin by explaining the basics of Python packages.

The basics of Python packages

I am not a Python expert, and the presented information is far from complete. However, it covers the required knowledge regarding Python packages.

In all but the latest Python versions, we used to need a file named __init__.py inside the directory of every Python package. Its purpose is to perform initialization actions and imports, as well as define variables. Although this is not the case with most recent Python versions, our packages will still have a __init__.py file in them. The good thing is that it is allowed to be empty if you have nothing to put into it. There is a link at the end of the chapter to the official Python documentation regarding packages, regular packages, and namespace packages, where the use of __init__.py is explained in more detail.

The next subsection discusses the details of the Python package that we are going to develop.

The SAX Python package

The code of the sax Python package is included in a directory named sax. The contents of the sax directory are presented with the help of the tree(1) command, which you might need to install on your own:

$ tree sax
sax
├── SAXalphabet
├── __init__.py
├── __pycache__
│   __init__.cpython-310.pyc
│   sax.cpython-310.pyc
│   tools.cpython-310.pyc
│   variables.cpython-310.pyc
├── sax.py
├── tools.py
└── variables.py
2 directories, 9 files

The __pycache__ directory is automatically generated by Python once you begin using the Python package and contains precompiled bytecode Python code. You can completely ignore that directory.

Let us begin by showing the contents of sax.py, which is going to be presented in multiple code chunks.

First, we have the import section and the implementation of the normalize() function, which normalizes a NumPy array:

import numpy as np
from scipy.stats import norm
from sax import tools
import sys
sys.path.insert(0,'..')
def normalize(x):
     eps = 1e-6
     mu = np.mean(x)
     std = np.std(x)
     if std < eps:
           return np.zeros(shape=x.shape)
     else:
           return (x-mu)/std

After that, we have the implementation of the createPAA() function, which returns the SAX representation of a time series, given the cardinality and the segments:

def createPAA(ts, cardinality, segments):
     SAXword = ""
     ts_norm = normalize(ts)
     segment_size = len(ts_norm) // segments
     mValue = 0
     for I in range(segments):
           ts_segment = ts_norm[segment_size * i :(i+1) * segment_size]
           mValue = meanValue(ts_segment)
           index = getIndex(mValue, cardinality)
           SAXword += str(index) +""""

Python uses the double slash // operator to perform floor division. What the // operator does is divide the first number by the second number before rounding the result down to the nearest integer – this is used for the segment_size variable.

The rest of the code is about specifying the correct index numbers when working with the given time series (or subsequence). Hence, the for loop is used to process the entire time series (or subsequence) based on the segments value.

Next, we have the implementation of a function that computes the mean value of a NumPy array:

def meanValue(ts_segment):
     sum = 0
     for i in range(len(ts_segment)):
           sum += ts_segment[i]
     mean_value = sum / len(ts_segment)
     return mean_value

Finally, we have the function that returns the SAX value of a SAX word, given its mean value and its cardinality. Remember that we calculate the mean value of each SAX word separately in the createPAA() function:

def getIndex(mValue, cardinality):
     index = 0
     # With cardinality we get cardinality + 1
     bPoints = tools.breakpoints(cardinality-1)
     while mValue < float(bPoints[index]):
           if index == len(bPoints)–- 1:
                 # This means that index should be advanced
                 # before breaking out of the while loop
                 index += 1
                 break
           else:
                 index += 1
     digits = tools.power_of_two(cardinality)
     # Inverse the result
     inverse_s = ""
     for i in binary_index:
           if i == '0':
                 inverse_s += '1'
           else:
                 inverse_s += '0'
     return inverse_s

The previous code computes the SAX value of a SAX word using its mean value. It iteratively visits the breakpoints, from the lowest value to the biggest, up to the point that the mean value exceeds the current breakpoint. This way, we find the index of the SAX word (mean value) in the list of breakpoints.

Now, let us discuss a tricky point, which has to do with the last statements that reverse the SAX word. This mainly has to do with whether we begin counting from the top or the bottom of the different areas that the breakpoints create. All ways are equivalent – we just decided to go that way. This is because a previous implementation of SAX used that order, and we wanted to make sure that we created the same results for testing reasons. If you want to alter that functionality, you just have to remove the last for loop.

As you saw at the beginning of this section, the sax package is composed of three Python files, not just the one that we just presented. So, we will present the remaining two files.

First, we will present the contents of variables.py:

# This file includes all variables for the sax package
maximumCardinality = 32
# Where to find the breakpoints file
# In this case, in the current directory
breakpointsFile =""SAXalphabe""
# Sliding window size
slidingWindowSize = 16
# Segments
segments = 0
# Breakpoints in breakpointsFile
elements ="""
# Floating point precision
precision = 5

You might wonder what the main reason is for having such a file. The answer is that we need to have a place to keep our global parameters and options, and having a separate file for that is a perfect solution. This will make much more sense when the code becomes longer and more complex.

Second, we present the code in tools.py:

import os
import numpy as np
import sys
from sax import variables
breakpointsFile = variables.breakpointsFile
maxCard = variables.maximumCardinality

Here, we reference two variables from the variable.py file, which are variables.breakpointsFile and variables.maximumCardinality:

def power_of_two(n):
     power = 1
     while n/2 != 1:
           # Not a power of 2
           if n % 2 == 1:
                 return -1
           n = n / 2
           power += 1
     return power

This is a helper function that we use when we want to make sure that a value is a power of 2:

def load_sax_alphabet():
     path = os.path.dirname(__file__)
     file_variable = open(path +"""" + breakpointsFile)
     variables.elements = file_variable.readlines()
def breakpoints(cardinality):
     if variables.elements ==""":
           load_sax_alphabet()
     myLine = variables.elements[cardinality–- 1].rstrip()
     elements = myLine.split'''')
     elements.reverse()
     return elements

The load_sax_alphabet() function loads the contents of the file with the definitions of breakpoints and assigns them to the variables.elements variable. The breakpoints() function returns the breakpoint values when given the cardinality.

As you can see, the code of the entire package is relatively short, which is a good thing.

In this section, we developed a Python package to compute SAX representations. In the next section, we are going to begin working with the sax package.