Book Image

Learning Kibana 7 - Second Edition

By : Anurag Srivastava, Bahaaldine Azarmi
Book Image

Learning Kibana 7 - Second Edition

By: Anurag Srivastava, Bahaaldine Azarmi

Overview of this book

<p>Kibana is a window into the Elastic Stack that enables the visual exploration and real-time analysis of your data in Elasticsearch. This book will help you understand how you can use Kibana 7 for rich analytics and data visualization. </p><p>If you’re new to the tool or want to get to grips with the latest features introduced in Kibana 7, this book is the perfect beginner's guide. You’ll learn how to set up and configure the Elastic Stack and understand where Kibana sits within the architecture. As you advance, you’ll learn how to ingest data from different sources using Beats or Logstash into Elasticsearch, followed by exploring and visualizing data in Kibana. Whether working with time-series data to create complex graphs using Timelion or embedding visualizations created in Kibana into your web applications, this book covers it all. It also covers topics that every Elastic developer needs to be aware of, such as installing and configuring application performance monitoring (APM) servers and agents. Finally, you’ll also learn how to create effective machine learning jobs in Kibana to find anomalies in your data. </p><p>By the end of this book, you’ll have a solid understanding of Kibana, and be able to create your own visual analytics solutions from scratch.</p>
Table of Contents (16 chapters)
Free Chapter
1
Section 1: Understanding Kibana 7
4
Section 2: Exploring the Data
7
Section 3: Tools for Playing with Your Data
10
Section 4: Advanced Kibana Options

Understanding your data for analysis in Kibana

Here, we will discuss different aspects of data analysis such as data shipping, data ingestion, data storage, and data visualization. These are all very important aspects of data analysis and visualization, and we need to understand each of them in detail. The objective is to then understand how to avoid any confusion, and build an architecture that will serve the different following aspects.

Data shipping

Data-shipping architecture should support any sort of data or event transport that is either structured or unstructured. The primary goal of data shipping is to send data from remote machines to a centralized location in order to make it available for further exploration. For data shipping, we generally deploy lightweight agents that sit on the same server from where we want to get the data. These shippers fetch the data and keep on sending them to the centralized server. For data shipping, we need to consider the following:

  • The agents should be lightweight. They should not take resources with the process that generates the actual data, in order to minimize the performance impact and place fewer footprints on it.
  • There are a lot of data shipping technologies out there; some of them are tied to a specific technology, while others are based on an extensible framework that can adapt relatively to a data source.
  • Shipping data is not only about sending data over the wire; in fact, it's also about security and making sure that the data is sent to the proper destination with an end-to-end secured pipeline.
  • Another aspect of data shipping is the management of data loads. Shipping data should be done relative to the load that the end destination is able to ingest; this feature is called back pressure management.

It's essential for data visualization to rely on reliable data shipping. As an example, consider data flowing from financial trade machines and how critical it could be not to be able to detect a security leak just because you are losing data.

Data ingestion

The scope of an ingestion layer is to receive data, encompassing as wide a range of commonly used transport protocols and data formats as possible, while providing capabilities to extract and transform this data before finally storing it.

Processing data can somehow be seen as extracting, transforming, and loading (ETL) data, which is often called an ingestion pipeline and, essentially, receives data from the shipping layer to push it to a storage layer. It comes with the following features:

  • Generally, the ingestion layer has a pluggable architecture to ease integration with the various sources of data and destinations, with the help of a set of plugins. Some of the plugins are made for receiving data from shippers, which means that data is not always received from shippers and can come directly from a data source such as a file, a network, or even a database. It can be ambiguous in some cases: should I use a shipper or a pipeline to ingest data from the file? It will, of course, depend on the use case and also on the expected SLAs.
  • The ingestion layer should be used to prepare the data by, for example, parsing the data, formatting the data, doing the correlation with other data sources, and normalizing and enriching the data before storage. This has many advantages, but the most important advantage is that it can improve the quality of the data, providing better insights for visualization. Another advantage could be to remove processing overheads later on, by precomputing a value or looking up a reference. The drawback of this is that you may need to ingest the data again if the data is not properly formatted or enriched for visualization. Hopefully, there are some ways to process the data after it has been ingested.
  • Ingesting and transforming data consumes compute resources. It is essential that we consider this, usually in terms of maximum data throughput per unit, and plan for ingestion by distributing the load over multiple ingestion instances. This is a very important aspect of real-time visualization, or, to be precise, near real-time visualization. If ingestion is spread across multiple instances, it can accelerate the storage of the data and, therefore, make it available faster for visualization.

Storing data at scale

Storage is undoubtedly the masterpiece of the data-driven architecture. It provides the essential, long-term retention of your data. It also provides the core functionality to search, analyze, and discover insights in your data. It is the heart of the process. The action will depend on the nature of the technology. Here are some aspects that the storage layer usually brings:

  • Scalability is the main aspect, that is, the storage used for various volumes of data that could start from gigabytes to terabytes to petabytes of data. The scalability is horizontal, which means that, as demand and volume grow, you should be able to increase the capacity of the storage seamlessly by adding more machines.
  • Most of the time, a non-relational and highly distributed data store, which allows fast data access and analysis at a high volume and on a variety of data types, is used, namely, a NoSQL data store. Data is partitioned and spread over a set of machines in order to balance the load while reading or writing data.
  • For data visualization, it's essential that the storage exposes an API to make analysis on top of the data. Letting the visualization layer do the statistical analysis, such as grouping data over a given dimension (aggregation), wouldn't
    scale.
  • The nature of the API can depend on the expectation of the visualization layer, but most of the time it's about aggregations. The visualization should only render the result of the heavy lifting done at the storage level.
  • A data-driven architecture can serve data to a lot of different applications and users, and for different levels of SLAs. High availability becomes the norm in such architectures, and, like scalability, it should be part of the nature of the solution.

Visualizing data

The visualization layer is the window on the data. It provides a set of tools to build live graphs and charts to bring the data to life, allowing you to build rich, insightful dashboards that answer the questions: What is happening now? Is my business healthy? What is the mood of the market?

The visualization layer in a data-driven architecture is a layer where we expect the majority of the data consumption and is mostly focused on bringing KPIs on top of stored data. It comes with the following essential features:

  • It should be lightweight and only render the result of the processing done in the storage layer
  • It allows the user to discover the data and get quick out-of-the-box insights on the data
  • It offers a visual way to ask unexpected questions to the data, rather than having to implement the proper request to do that
  • In modern data architectures that must address the needs of accessing KPIs as fast as possible, the visualization layer should render the data in near real time
  • The visualization framework should be extensible and allow users to customize the existing assets or to add new features depending on the need
  • The user should be able to share the dashboards outside of the visualization application

As you can see, it's not only a matter of visualization. You need some foundations to reach the objectives. This is how we'll address the use of Kibana in this book: we'll focus on use cases and see what is the best way to leverage Kibana features, depending on the use case and context.

The main differentiator from the other visualization tools is that Kibana comes alongside a full stack, the Elastic Stack, with seamless integration with every layer of the stack, which just eases the deployment of such architecture. There are a lot of other technologies out there; we'll now explore what they are good at and what their limits are.