Book Image

Deep Learning and XAI Techniques for Anomaly Detection

By : Cher Simon
Book Image

Deep Learning and XAI Techniques for Anomaly Detection

By: Cher Simon

Overview of this book

Despite promising advances, the opaque nature of deep learning models makes it difficult to interpret them, which is a drawback in terms of their practical deployment and regulatory compliance. Deep Learning and XAI Techniques for Anomaly Detection shows you state-of-the-art methods that’ll help you to understand and address these challenges. By leveraging the Explainable AI (XAI) and deep learning techniques described in this book, you’ll discover how to successfully extract business-critical insights while ensuring fair and ethical analysis. This practical guide will provide you with tools and best practices to achieve transparency and interpretability with deep learning models, ultimately establishing trust in your anomaly detection applications. Throughout the chapters, you’ll get equipped with XAI and anomaly detection knowledge that’ll enable you to embark on a series of real-world projects. Whether you are building computer vision, natural language processing, or time series models, you’ll learn how to quantify and assess their explainability. By the end of this deep learning book, you’ll be able to build a variety of deep learning XAI models and perform validation to assess their explainability.
Table of Contents (15 chapters)
1
Part 1 – Introduction to Explainable Deep Learning Anomaly Detection
4
Part 2 – Building an Explainable Deep Learning Anomaly Detector
8
Part 3 – Evaluating an Explainable Deep Learning Anomaly Detector

Discovering real-world use cases

Anomaly detection plays a crucial role in extracting valuable insights for risk management. Over the years, anomaly detection applications have diversified across various domains, including medical diagnosis, fraud discovery, quality control analysis, predictive maintenance, security scanning, and threat intelligence. In this section, let’s look at some practical industry use cases of anomaly detection, including the following:

  • Detecting fraud
  • Predicting industrial maintenance
  • Diagnosing medical conditions
  • Monitoring cybersecurity threats
  • Reducing environmental impact
  • Recommending financial strategies

Detecting fraud

The continued growth of the global economy and increased business demand for real-time and ubiquitous digital payment methods open the door to fraud exposure, causing electronic commerce systems to be vulnerable to organized crimes. Fraud prevention mechanisms that protect technological systems from potential fraud risks are insufficient to cover all possible fraudulent scenarios. Thus, fraud detection systems provide an additional layer of protection in detecting suspicious and malicious activities.

Discovery sampling is an auditing technique that determines whether to approve or reject a sampled audit population if the percentage error rate is below the defined minimum unacceptable threshold. Manual fraud audit techniques based on discovery sampling require domain knowledge across multiple disciplines and are time-consuming. Leveraging machine learning (ML) in fraud detection systems has proven to produce higher model accuracy and detect novel anomaly classes.

Fraud detection systems leverage behavioral profiling methods to prevent fraud by modeling individual behavioral patterns and monitoring deviations from the norms, such as daily banking activities, spending velocity, transacted foreign countries, and beneficiaries based on historical transactions. Nevertheless, an individual’s spending habits are influenced by changes in income, lifestyle, and other external factors. Such unpredicted changes can introduce concept drift with the underlying model. Hence, a fraud detection model and an individual’s transaction profiling must be recursively and dynamically updated by correlating input data changes and various parameters to enable adaptive behavioral profiling.

Let’s review a fraud detection example using an anonymized multivariate credit card transactions dataset from https://www.kaggle.com/datasets/whenamancodes/fraud-detection and AutoEncoder provided by PyOD, https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.auto_encoder.

AutoEncoder is an unsupervised deep learning algorithm that can reconstruct high dimensional input data using a compressed latent representation of the input data. AutoEncoder helps detect abnormalities in the data by calculating the reconstruction errors.

Figure 1.11 shows a high-level AutoEncoder architecture that consists of three components:

  • Encoder – Translates high dimensional input data into a low dimensional latent representation
  • Code – Learns the latent-space representation of the input data
  • Decoder – Reconstructs the input data based on the encoder’s output
Figure 1.11 – The AutoEncoder architecture

Figure 1.11 – The AutoEncoder architecture

A sample notebook, chapter1_pyod_autoencoder.ipynb, can be found in the book's GitHub repo.

You can also experiment with this example on Amazon SageMaker Studio Lab, https://aws.amazon.com/sagemaker/studio-lab/, a free notebook development environment that provides up to 12 hours of CPU or 4 hours of GPU per user session and 15 GiB storage at no cost. Alternatively, you can try this on your preferred IDE. Let’s get started:

  1. First, install the required packages using provided requirements.txt file.
    import sys
    !{sys.executable} -m pip install -r requirements.txt
  2. Load the essential libraries:
    %matplotlib inline
    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    import seaborn as sns
    import os
    from platform import python_version
    import tensorflow as tf
    from pyod.models.auto_encoder import AutoEncoder
    os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"
    print(f'TensorFlow version: {tf.__version__}')
    print(f'Python version: {python_version()}')
  3. Load and preview the anonymized credit card transactions dataset:
    df = pd.read_csv('creditcard.csv')
    df.head()

The result will be as follows:

Figure 1.12 – Preview anonymized credit card transactions dataset

Figure 1.12 – Preview anonymized credit card transactions dataset

  1. Assign model features and the target label to variables:
    model_features = df.columns.drop('Class')
    X = df[model_features]
    y = df['Class']
  2. View the frequency distribution for target labels. You should have 284,315 non-fraudulent transactions for class 0 and 492 fraudulent transactions for class 1:
    y.value_counts()
  3. Set the contamination rate for the amount of contamination or the proportion of outliers in the training dataset. The default contamination value is 0.1. Here, we are setting contamination to the maximum value, 0.5. PyOD uses this setting to calculate the threshold. Fix the number of epochs for training:
    contamination = 0.5
    epochs = 30
  4. Set the number of neurons per hidden layer and initialize AutoEncoder for training:
    hn = [64, 30, 30, 64]
    clf = AutoEncoder(epochs=epochs, contamination=contamination, hidden_neurons=hn)
    clf.fit(X)

Figure 1.13 shows a model summary for AutoEncoder in this example:

Figure 1.13 – The AutoEncoder model summary

Figure 1.13 – The AutoEncoder model summary

  1. Obtain predictions on outliers:
    outliers = clf.predict(X)
  2. Filter outliers from the model’s predictions. The anomaly variable contains the identified outliers:
    anomaly = np.where(outliers==1)
    anomaly
  3. View the output of a particular instance. You should see the output is 1, indicating this is predicted as a fraudulent transaction. Validate the result with the ground truth:
    sample = X.iloc[[4920]]
    clf.predict(sample, return_confidence=False)

The result displayed is shown in Figure 1.14:

Figure 1.14 – Prediction versus ground truth

Figure 1.14 – Prediction versus ground truth

  1. Evaluate the model’s prediction if using a perturbed dataset:
    clf.predict_confidence(sample)
  2. Generate binary labels of the training data, where 0 means inliers and 1 means outliers:
    y_pred = clf.labels_
  3. Call the decision_scores_ function to calculate anomaly scores. Higher values represent a higher severity of abnormalities:
    y_scores = clf.decision_scores_
  4. Figure 1.15 shows anomaly scores calculated by decision_scores_ using the threshold value based on the contamination rate using the following code. The red horizontal line represents the threshold in use:
    plt.rcParams["figure.figsize"] = (15,8)
    plt.plot(y_scores);
    plt.axhline(y=clf.threshold_, c='r', ls='dotted', label='threshold');
    plt.xlabel('Instances')
    plt.ylabel('Decision Scores')
    plt.title('Anomaly Scores with Auto-Calculated Threshold');
    plt.savefig('auto_decision_scores.png', bbox_inches='tight')
    plt.show()

Figure 1.15 – Auto-calculated anomaly scores

  1. Figure 1.16 shows the modified threshold using the following code. The red horizontal line represents the new threshold:
    threshold = 50
    plt.rcParams["figure.figsize"] = (15,8)
    plt.plot(y_scores, color="green");
    plt.axhline(y=threshold, c='r', ls='dotted', label='threshold');
    plt.xlabel('Instances')
    plt.ylabel('Anomaly Scores')
    plt.title('Anomaly Scores with Modified Threshold');
    plt.savefig('modified_threshold.png', bbox_inches='tight')
    plt.show()

Figure 1.16 – Modified threshold

  1. We will use the following code to determine the error loss history:
    plt.rcParams["figure.figsize"] = (15,8)
    pd.DataFrame.from_dict(clf.history_).plot(title='Error Loss');
    plt.savefig('error_loss.png', bbox_inches='tight')
    plt.show()

Figure 1.17 shows the error loss history:

Figure 1.17 – Error loss history

  1. Visualize anomaly scores and outliers by comparing Time and Amount with a scatter plot:
    sns.scatterplot(x="Time", y="Amount", hue=y_scores, data=df, palette="RdBu_r", size=y_scores);
    plt.xlabel('Time (seconds elapsed from first transaction)')
    plt.ylabel('Amount')
    plt.legend(title='Anomaly Scores')
    plt.savefig('pca_anomaly_score.png', bbox_inches='tight')
    plt.show()

The result is shown in Figure 1.18:

Figure 1.18 – Anomaly scores and outlier

You completed a walk-through of a fraud detection example using AutoEncoder. The following section discusses a few more real-world anomaly detection examples.

Predicting industrial maintenance

The rise of Industry 4.0 transformed manufacturing technologies focusing on interconnectivity between machines and industrial equipment using the Internet of Things (IoT). Real-time data produced by interconnected devices presents enormous opportunities for predictive analytics in structural health checks and anomaly detection.

Inadequate machine maintenance is the primary cause of unplanned downtime in manufacturing. Improving equipment availability and performance is critical in preventing unplanned downtime, avoiding unnecessary maintenance costs, and increasing productivity in industrial workloads.

Although equipment health can deteriorate over time due to regular use, early discovery of abnormal symptoms helps optimize performance and uptime over a machine’s life expectancy and ensure business continuity.

Predictive maintenance techniques have evolved from reactive mode to ML approaches. Anomaly detection for predictive maintenance is challenging due to a lack of domain knowledge in defining anomaly classes and the absence of past anomalous behaviors in the available data. Many existing manufacturing processes can only detect a subset of anomalies, leaving the remaining anomalies undetected before the equipment goes into a nonfunctional state. Anomaly detection in predictive maintenance aims to predict the onset of equipment failure and perform prompt maintenance to avoid unnecessary downtime.

You can try implementing a predictive maintenance problem using PyOD with this dataset: https://www.kaggle.com/datasets/shivamb/machine-predictive-maintenance-classification.

Diagnosing medical conditions

The physiological data collection through medical diagnosis applications, such as magnetic resonance imaging (MRI), and wearable devices, such as glucose monitors, enables healthcare professionals to highlight abnormal readings that may be precursors of potential health risks to patients using anomaly detection approaches. Besides medical diagnosis, anomaly detection helps healthcare providers predict recovery rates and escalates medical risks by forecasting physiological signals, such as heart rate and blood pressure.

Detection and prediction accuracy is critical in medical anomaly detection as they involve time-sensitive decisions and life-and-death situations. Besides common challenges with class imbalance and scarcity of anomaly samples, medical anomaly detection faces challenges in distinguishing patient and demographic-specific characteristics.

Deep learning techniques have gained popularity in medical anomaly detection due to their feature learning and non-linearity modeling capabilities. However, current deep medical anomaly detection methods mainly correlate patients’ symptoms with a known disease category based on annotated data. Medical experts will be skeptical of trusting decisions made by black-box models without quantifiable causal estimation or explanation. Hence, the role of explainable artificial intelligence (XAI) is crucial in providing end users visibility into how a deep learning model derives a prediction that leads to informed decision-making.

Monitoring cybersecurity threats

Detecting zero-day attacks or unforeseen threats is highly desirable in security applications. Therefore, unsupervised deep learning techniques using unlabeled datasets are widely applied in security-related anomaly detection applications such as intrusion detection systems (IDSs), web attack detection, video surveillance, and advanced persistent threat (APT).

Two categories of IDSs are host-based and network-based. Host-based IDSs detect collective anomalies such as malicious applications, policy violations, and unauthorized access by analyzing sequential call traces at the operating system level. Network-based IDSs analyze high-dimensional network data to identify potential external attacks for unauthorized network access.

Web applications are now an appealing target to cybercriminals as data becomes ubiquitous. Existing signature-based techniques using static rules no longer provide sufficient web attack protection because the quality of rulesets depends on known attacks in the signature dataset. Anomaly-based web attack detection methods distinguish anomalous web requests by measuring the probability threshold of attributes in the request to the established normal request profiles.

Reducing environmental impact

The widespread climate change driven by human activities has contributed to the rise of the earth’s temperature by 0.14 degrees Fahrenheit or -17.7 degrees Celsius since 1880, and 2020 was marked as the second-warmest year according to the National Oceanic Atmospheric Administration (NOAA). Irreversible consequences of climate change can lead to other impacts, such as intense droughts, water shortages, catastrophic wildfires, and severe flooding. Detecting abnormal weather patterns and climatic events, such as the frequency of heat and cold waves, cyclones, and floods, provides a scientific understanding of the behaviors and relationships of climatological variables.

With almost 80% of the world’s energy produced by fossil fuels, it is crucial to develop green energy sources and reduce total energy consumption by identifying wasted energy using anomaly detection approaches through smart sensors. For example, buildings contribute 40% of global energy consumption and 33% of greenhouse gas emissions. Thus, reducing building energy consumption is a significant effort toward achieving net-zero carbon emissions by 2050.

Recommending financial strategies

Identifying anomalies in financial data, such as stock market indices, is instrumental for informed decision-making and competitive advantage. Characteristics of financial data include volume, velocity, and variety. For example, The New York Stock Exchange (NYSE) generates over one terabyte of stock market data daily that reflects continuous market changes at low latency. Market participants need a mechanism to identify anomalies in financial data that can cause misinterpretation of market behavior leading to poor trading decisions.

Now that we have covered some commercial and environmental use cases for anomaly detection, you are ready to explore various deep learning approaches and their appropriate use for detecting anomalies in the following section.