Book Image

Machine Learning with the Elastic Stack - Second Edition

By : Rich Collier, Camilla Montonen, Bahaaldine Azarmi
5 (1)
Book Image

Machine Learning with the Elastic Stack - Second Edition

5 (1)
By: Rich Collier, Camilla Montonen, Bahaaldine Azarmi

Overview of this book

Elastic Stack, previously known as the ELK stack, is a log analysis solution that helps users ingest, process, and analyze search data effectively. With the addition of machine learning, a key commercial feature, the Elastic Stack makes this process even more efficient. This updated second edition of Machine Learning with the Elastic Stack provides a comprehensive overview of Elastic Stack's machine learning features for both time series data analysis as well as for classification, regression, and outlier detection. The book starts by explaining machine learning concepts in an intuitive way. You'll then perform time series analysis on different types of data, such as log files, network flows, application metrics, and financial data. As you progress through the chapters, you'll deploy machine learning within Elastic Stack for logging, security, and metrics. Finally, you'll discover how data frame analysis opens up a whole new set of use cases that machine learning can help you with. By the end of this Elastic Stack book, you'll have hands-on machine learning and Elastic Stack experience, along with the knowledge you need to incorporate machine learning in your distributed search and data analysis platform.
Table of Contents (19 chapters)
1
Section 1 – Getting Started with Machine Learning with Elastic Stack
4
Section 2 – Time Series Analysis – Anomaly Detection and Forecasting
11
Section 3 – Data Frame Analysis

Unsupervised versus supervised ML

While there are many subtypes of ML, two very prominent ones (and the two that are relevant to Elastic ML) are unsupervised and supervised.

In unsupervised ML, there is no outside guidance or direction from humans. In other words, the algorithms must learn (and model) the patterns of the data purely on their own. In general, the biggest challenge here is to have the algorithms accurately surface detected deviations of the input data's normal patterns to provide meaningful insight for the user. If the algorithm is not able to do this, then it is not useful and is unsuitable for use. Therefore, the algorithms must be quite robust and able to account for all of the intricacies of the way that the input data is likely to behave.

In supervised ML, input data (often multivariate data) is used to help model the desired outcome. The key difference from unsupervised ML is that the human decides, a priori, what variables to use as the input and also provides "ground-truth" examples of the expected target variable. Algorithms then assess how the input variables interact and influence the known output target. To accurately get the desired output (prediction, for example), the algorithm must have "the right kind of data" not only that indeed expresses the situation, but also so that there is enough diversity of the input data in order to effectively learn the relationship between the input data and the output target.

As such, both cases require good input data, good algorithmic approaches, and a good mechanism to allow the ML to both learn the behavior of the data and apply that learning to assess subsequent observations of that data. Let's dig a little deeper into the specifics of how Elastic ML leverages unsupervised and supervised learning.