Book Image

Learning Kibana 7 - Second Edition

By : Anurag Srivastava, Bahaaldine Azarmi
Book Image

Learning Kibana 7 - Second Edition

By: Anurag Srivastava, Bahaaldine Azarmi

Overview of this book

<p>Kibana is a window into the Elastic Stack that enables the visual exploration and real-time analysis of your data in Elasticsearch. This book will help you understand how you can use Kibana 7 for rich analytics and data visualization. </p><p>If you’re new to the tool or want to get to grips with the latest features introduced in Kibana 7, this book is the perfect beginner's guide. You’ll learn how to set up and configure the Elastic Stack and understand where Kibana sits within the architecture. As you advance, you’ll learn how to ingest data from different sources using Beats or Logstash into Elasticsearch, followed by exploring and visualizing data in Kibana. Whether working with time-series data to create complex graphs using Timelion or embedding visualizations created in Kibana into your web applications, this book covers it all. It also covers topics that every Elastic developer needs to be aware of, such as installing and configuring application performance monitoring (APM) servers and agents. Finally, you’ll also learn how to create effective machine learning jobs in Kibana to find anomalies in your data. </p><p>By the end of this book, you’ll have a solid understanding of Kibana, and be able to create your own visual analytics solutions from scratch.</p>
Table of Contents (16 chapters)
Free Chapter
1
Section 1: Understanding Kibana 7
4
Section 2: Exploring the Data
7
Section 3: Tools for Playing with Your Data
10
Section 4: Advanced Kibana Options

Technology limitations

In this section, we will try to analyze why some technologies have limitations and are not able to support end-to-end solutions for a given problem when we try to fulfill the expectations of a data-driven architecture. In these situations, either we use a set of tools to fulfill the requirement or we make certain compromises with the requirement as per the feature availability of that technology. So, let's now discuss some of the available technologies.

Relational databases

Relational databases are popular and important tools that people use to store their data in the context of a data-driven architecture; for example, we can save the application monitoring logs in a database such as MySQL that can later be used to monitor the application. But when it comes to data visualization, it starts to break all the essential features we mentioned earlier:

  • A Relational Database Management System (RDBMS) only manages fixed schemas and is not designed to deal with dynamic data models and unstructured data. Any structural changes made on the data will require updating the schema/tables, which, as everybody knows, is expensive.
  • RDBMS doesn't allow real-time data access at scale. It wouldn't be realistic, for example, to create an index for each column, for each table, or for each schema in an RDBMS; however, essentially, that is what would be required for real-time access.
  • Scalability is not the easiest thing for RDBMSes; it can be a complex and heavy process to put in place and wouldn't scale against a data explosion.

RDBMSes should be used as a source of data that can be used before ingestion time in order to correlate or enrich the ingested data to have a better granularity in the visualized data. Visualization is about providing users with the flexibility to create multiple views of the data, and enabling them to explore and ask their own questions without predefining a schema or constructing a view in the storage layer.

Hadoop

The Hadoop ecosystem is pretty rich in terms of projects. It's often hard to pick or understand which project will fit our requirements; if we step back, we can consider the following aspects that Hadoop fulfills:

  • It fits for massive-scale data architecture and will help to store and process any kind of data, and for any level of volume
  • It has out-of-the-box batch and streaming technologies that will help to process the data as it comes in to create an iterative view on top of the raw data, or enable longer processing for larger-scale views
  • The underlying architecture is made to make the integration of processing engines easy, so you can plug and process your data with a lot of different frameworks
  • It's made to implement the data lake paradigms where you can essentially drop in data in order to process it

But what about visualization? Well, there are tons of initiatives out there, but the problem is that none of them can go against the real nature of Hadoop, which doesn't help for real-time data visualization at scale:

  • The Hadoop Distributed File System (HDFS) is a sequential read-and-write filesystem that doesn't help for random access.
  • Even the interactive ad hoc query or the existing real-time API doesn't scale in terms of integration with the visualization application. Most of the time, the user has to export their data outside of Hadoop in order to visualize it; some visualizations claim to have a transparent integration with HDFS, whereas, under the hood, the data is exported and loaded in the memory in batches, which makes the user experience pretty heavy and slow.
  • Data visualization is all about APIs and easy access to the data, which Hadoop is not good at, as it always requires implementation from the user.

Hadoop is good for processing data, and is often used conjointly with other real-time technology, such as Elastic, to build Lambda architectures, as shown in the following diagram:

In this architecture, you can see that Hadoop aggregates incoming data either in a long processing zone or a near real-time zone. Finally, the results are indexed in Elasticsearch in order to be visualized in Kibana. Essentially, this means that one technology is not meant to replace the other, but that you can leverage the best of both.

NoSQL

There are many different, very performant, and massively scalable NoSQL technologies out there, such as key-value stores, document stores, and columnar stores; however, most of them do not serve analytic APIs or come with an out-of-the-box visualization application.

In most cases, the data that these technologies are using is ingested in an indexation engine, such as Elasticsearch, to provide analytics capabilities for visualization or search purposes.

With the fundamental layers that a data-driven architecture should have and the limits identified in existing technologies in the market, let's now introduce the Elastic Stack, which essentially answers these shortcomings.