Machine Learning with the Elastic Stack - Second Edition

By : Rich Collier, Camilla Montonen, Bahaaldine Azarmi

5 (1)

Buy this Book

Machine Learning with the Elastic Stack - Second Edition

5 (1)

By: Rich Collier, Camilla Montonen, Bahaaldine Azarmi

Buy this Book

Overview of this book

Elastic Stack, previously known as the ELK stack, is a log analysis solution that helps users ingest, process, and analyze search data effectively. With the addition of machine learning, a key commercial feature, the Elastic Stack makes this process even more efficient. This updated second edition of Machine Learning with the Elastic Stack provides a comprehensive overview of Elastic Stack's machine learning features for both time series data analysis as well as for classification, regression, and outlier detection. The book starts by explaining machine learning concepts in an intuitive way. You'll then perform time series analysis on different types of data, such as log files, network flows, application metrics, and financial data. As you progress through the chapters, you'll deploy machine learning within Elastic Stack for logging, security, and metrics. Finally, you'll discover how data frame analysis opens up a whole new set of use cases that machine learning can help you with. By the end of this Elastic Stack book, you'll have hands-on machine learning and Elastic Stack experience, along with the knowledge you need to incorporate machine learning in your distributed search and data analysis platform.

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Reviews

Section 1 – Getting Started with Machine Learning with Elastic Stack

Free Chapter

Chapter 1: Machine Learning for IT

Overcoming the historical challenges in IT

Dealing with the plethora of data

The advent of automated anomaly detection

Unsupervised versus supervised ML

Using unsupervised ML for anomaly detection

Applying supervised ML to data frame analytics

Summary

Chapter 2: Enabling and Operationalization

Technical requirements

Enabling Elastic ML features

Understanding operationalization

Summary

Section 2 – Time Series Analysis – Anomaly Detection and Forecasting

Chapter 3: Anomaly Detection

Technical requirements

Elastic ML job types

Dissecting the detector

Detecting changes in metric values

Understanding the advanced detector functions

Splitting analysis along categorical features

Understanding temporal versus population analysis

Categorization analysis of unstructured messages

Managing Elastic ML via the API

Summary

Chapter 4: Forecasting

Technical requirements

Contrasting forecasting with prophesying

Forecasting use cases

Forecasting theory of operation

Single time series forecasting

Looking at forecast results

Multiple time series forecasting

Summary

Chapter 5: Interpreting Results

Technical requirements

Viewing the Elastic ML results index

Anomaly scores

Results index schema details

Multi-bucket anomalies

Forecast results

Results API

Custom dashboards and Canvas workpads

Summary

Chapter 6: Alerting on ML Analysis

Technical requirements

Understanding alerting concepts

Building alerts from the ML UI

Creating an alert with a watch

Summary

Chapter 7: AIOps and Root Cause Analysis

Technical requirements

Demystifying the term ''AIOps''

Understanding the importance and limitations of KPIs

Moving beyond KPIs

Organizing data for better analysis

Leveraging the contextual information

Bringing it all together for RCA

Summary

Chapter 8: Anomaly Detection in Other Elastic Stack Apps

Technical requirements

Anomaly detection in Elastic APM

Anomaly detection in the Logs app

Anomaly detection in the Uptime app

Anomaly detection in the Elastic Security app

Summary

Section 3 – Data Frame Analysis

Chapter 9: Introducing Data Frame Analytics

Technical requirements

Learning how to use transforms

Using Painless for advanced transform configurations

Working with Python and Elasticsearch

Summary

Dealing with the plethora of data

IT departments have invested in monitoring tools for decades, and it is not uncommon to have a dozen or more tools actively collecting and archiving data that can be measured in terabytes, or even petabytes, per day. The data can range from rudimentary infrastructure- and network-level data to deep diagnostic data and/or system and application log files.

Business-level key performance indicators (KPIs) could also be tracked, sometimes including data about the end user's experience. The sheer depth and breadth of data available, in some ways, is the most comprehensive than it has ever been. To detect emerging problems or threats hidden in that data, there have traditionally been several main approaches to distilling the data into informational insights:

Filter/search: Some tools allow the user to define searches to help trim down the data into a more manageable set. While extremely useful, this capability is most often used in an ad hoc fashion once a problem is suspected. Even then, the success of using this approach usually hinges on the ability for the user to know what they are looking for and their level of experience—both with prior knowledge of living through similar past situations and expertise in the search technology itself.
Visualizations: Dashboards, charts, and widgets are also extremely useful to help us understand what data has been doing and where it is trending. However, visualizations are passive and require being watched for meaningful deviations to be detected. Once the number of metrics being collected and plotted surpasses the number of eyeballs available to watch them (or even the screen real estate to display them), visual-only analysis becomes less and less useful.
Thresholds/rules: To get around the requirement of having data be physically watched in order for it to be proactive, many tools allow the user to define rules or conditions that get triggered upon known conditions or known dependencies between items. However, it is unlikely that you can realistically define all appropriate operating ranges or model all of the actual dependencies in today's complex and distributed applications. Plus, the amount and velocity of changes in the application or environment could quickly render any static rule set useless. Analysts find themselves chasing down many false positive alerts, setting up a boy who cried wolf paradigm that leads to resentment of the tools generating the alerts and skepticism of the value that alerting could provide.

Ultimately, there needed to be a different approach—one that wasn't necessarily a complete repudiation of past techniques, but could bring a level of automation and empirical augmentation of the evaluation of data in a meaningful way. Let's face it, humans are imperfect—we have hidden biases and limitations of capacity for remembering information and we are easily distracted and fatigued. Algorithms, if used correctly, can easily make up for these shortcomings.

Machine Learning with the Elastic Stack - Second Edition

By : Rich Collier, Camilla Montonen, Bahaaldine Azarmi

Machine Learning with the Elastic Stack - Second Edition

By: Rich Collier, Camilla Montonen, Bahaaldine Azarmi

Overview of this book

Related Content you might be interested in

Current Title:

Machine Learning with the Elastic Stack - Second Edition

Kibana 8.x – A Quick Start Guide to Data Analysis

Getting Started with Elastic Stack 8.0

Learning Kibana 7

Dealing with the plethora of data