Book Image

Mastering Kibana 6.x

Book Image

Mastering Kibana 6.x

Overview of this book

Kibana is one of the popular tools among data enthusiasts for slicing and dicing large datasets and uncovering Business Intelligence (BI) with the help of its rich and powerful visualizations. To begin with, Mastering Kibana 6.x quickly introduces you to the features of Kibana 6.x, before teaching you how to create smart dashboards in no time. You will explore metric analytics and graph exploration, followed by understanding how to quickly customize Kibana dashboards. In addition to this, you will learn advanced analytics such as maps, hits, and list analytics. All this will help you enhance your skills in running and comparing multiple queries and filters, influencing your data visualization skills at scale. With Kibana’s Timelion feature, you can analyze time series data with histograms and stats analytics. By the end of this book, you will have created a speedy machine learning job using X-Pack capabilities.
Table of Contents (21 chapters)
Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface
Index

What is ELK Stack?


ELK Stack is a stack with three different open source software—Elasticsearch, Logstash, and Kibana. Elasticsearch is a search engine that is developed on top of Apache Lucene. Logstash is basically used for data pipelining where we can get data from any data source as an input, transform it if required, and send it to any destination as an output. In general, we use Logstash to push the data into Elasticsearch. Kibana is a dashboard or visualization tool, which can be configured with Elasticsearch to generate charts, graphs, and dashboards using our data:

We can use ELK Stack for different use cases, the most common being log analysis. Other than that, we can use it for business intelligence, application security and compliance, web analytics, fraud management, and so on.

In the following subsections, we are going to be looking at ELK Stack's components.

Elasticsearch

Elasticsearch is a full text search engine that can be used as a NoSQL database and as an analytics engine. It is easy to scale, schemaless, and near real time, and provides a restful interface for different operations. It isschemaless,and it uses inverted indexes for data storage. There are different language clients available for Elasticsearch, as follows:

  • Java
  • PHP
  • Perl
  • Python
  • .NET
  • Ruby
  • JavaScript
  • Groovy

The basic components of Elasticsearch are as follows:

  • Cluster
  • Node
  • Index
  • Type
  • Document
  • Shard

Logstash

Logstash is basically used for data pipelining, through which we can take input from different sources and output to different data sources. Using Logstash, we can clean the data through filter options and mutate the input data before sending it to the output source. Logstash has different adapters to handle different applications, such as for MySQL or any other relational database connection. We have a JDBC input plugin through which we can connect to MySQL server, run queries, and take the table data as the input in Logstash. For Elasticsearch, there is a connector in Logstash that gives us the option to seamlessly transfer data from Logstash to Elasticsearch.

To run Logstash, we need to install Logstash and edit the configuration file logstash.conf, which consists of an input, output, and filter sections. We need to tell Logstash where it should get the input from through the input block, what it should do with the input through the filter block, and where it should send the output through the output block. In the following example, I am reading an Apache Access Log and sending the output to Elasticsearch:

input {
     file {
         path => "/var/log/apache2/access.log"
         }
 }

 filter {
     grok {
         match => { message => "%{COMBINEDAPACHELOG}" }
         }
 }

output {
   elasticsearch {
        hosts => "http://127.0.0.1:9200"
        index => "logs_apache"
        document_type => "logs"
        }
}

The input block is showing a file key that is set to /var/log/apache2/access.log. This means that we are getting the file input and path of the file, /var/log/apache2/access.log, which is Apache's log file. The filter block is showing the grok filter, which converts unstructured data into structured data by parsing it.

There are different patterns that we can apply for the Logstash filter. Here, we are parsing the Apache logs, but we can filter different things, such as email, IP addresses, and dates.

Kibana

Kibana is a dashboarding open source software from ELK Stack, and it is a very good tool for creating different visualizations, charts, maps, and histograms, and by integrating different visualizations together, we can create dashboards. It is part of ELK Stack; hence it is quite easy to read the Elasticsearch data. This does not require any programming skills. Kibana has a beautiful UI for creating different types of visualizations, including charts, histograms, and dashboards.

It provides us with different inbuilt dashboards with multiple visualizations when we use Beats, as it automatically creates multiple visualizations that we can customize to create a useful dashboard, such as for CPU usage and memory usage.

Beats

Beats are basically data shippers that are grouped to do single-purpose jobs. For example, Metricbeat is used to collect metrics for memory usage, CPU usage, and disk space, whereas Filebeat is used to send file data such as logs. They can be installed as agents on different servers to send data from different sources to a central Logstash or Elasticsearch cluster. They are written in Go; they work on a cross-platform environment; and they are lightweight in design. Before Beats, it was very difficult to get data from different machines as there was no single-purpose data shipper, and we had to do some tweaking to get the desired data from servers.

For example, if I am running a web application on the Apache web server and want to run it smoothly, then there are two things that need to be monitored—first, all of the errors from the application, and second, the server's performance, such as memory usage, CPU usage, and disk space. So, in order to collect this information, we need to install the following two Beats on our machine:

  • Filebeat: This is used to collect log data from Apache web server in an incremental way. Filebeat will run on the server and will periodically check for any change in the Apache log. When there is any change in the Apache log file, it will send the log to Logstash. Logstash will receive the data file and execute the filter to find the errors. After filtering the data, it saves the data into Elasticsearch.
  • Metricbeat: This is used to collect metrics for memory usage, CPU usage, disk space, and so on. Metricbeat collects the server metrics, such as memory usage, CPU usage, and disk space, and saves the data into Elasticsearch. Metrics data sends a predefined set of data, and there is no need to modify anything; that is why it sends data directly to Elasticsearch instead of sending it to Logstash first.

To visualize this data, we can use Kibana to create meaningful dashboards through which we can get complete control of our data.