Book Image

Mastering Elastic Stack

By : Ravi Kumar Gupta, Yuvraj Gupta
Book Image

Mastering Elastic Stack

By: Ravi Kumar Gupta, Yuvraj Gupta

Overview of this book

Even structured data is useless if it can’t help you to take strategic decisions and improve existing system. If you love to play with data, or your job requires you to process custom log formats, design a scalable analysis system, and manage logs to do real-time data analysis, this book is your one-stop solution. By combining the massively popular Elasticsearch, Logstash, Beats, and Kibana, elastic.co has advanced the end-to-end stack that delivers actionable insights in real time from almost any type of structured or unstructured data source. If your job requires you to process custom log formats, design a scalable analysis system, explore a variety of data, and manage logs, this book is your one-stop solution. You will learn how to create real-time dashboards and how to manage the life cycle of logs in detail through real-life scenarios. This book brushes up your basic knowledge on implementing the Elastic Stack and then dives deeper into complex and advanced implementations of the Elastic Stack. We’ll help you to solve data analytics challenges using the Elastic Stack and provide practical steps on centralized logging and real-time analytics with the Elastic Stack in production. You will get to grip with advanced techniques for log analysis and visualization. Newly announced features such as Beats and X-Pack are also covered in detail with examples. Toward the end, you will see how to use the Elastic stack for real-world case studies and we’ll show you some best practices and troubleshooting techniques for the Elastic Stack.
Table of Contents (19 chapters)
Mastering Elastic Stack
Credits
About the Authors
About the Reviewer
www.PacktPub.com
Customer Feedback
Preface

Introduction to ELK Stack


It all began with Shay Banon, who started an open source project called Elasticsearch, successor of Compass, which gained popularity as one of the top open source database engines. Later, based on the distributed model of working, Kibana was introduced, to visualize the data present in Elasticsearch. Earlier, to put data into Elasticsearch, we had Rivers, which provided us with a specific input via which we inserted data into Elasticsearch.

However, with growing popularity, this setup required a tool via which we could insert data into Elasticsearch and have flexibility to perform various transformations on data (to make unstructured data structured and have full control on how to process the data). Based on this premise, Logstash was born, which was then incorporated into the Stack, and together these three tools, Elasticsearch, Logstash, and Kibana were named ELK Stack.

The following diagram is a simple data pipeline using ELK Stack:

As we can see from the preceding figure, data is read using Logstash and indexed to Elasticsearch. Later, we can use Kibana to read the indices from Elasticsearch and visualize it using charts and lists. Let's understand these components separately, and the role they play in the making of the Stack.

Logstash

As mentioned earlier, Rivers were initially used to put data into Elasticsearch before ELK Stack. For ELK Stack, Logstash is the entry point for all types of data. Logstash has so many plugins to read data from a number of sources, and so many output plugins to submit data to a variety of destinations - one of those is the Elasticsearch plugin, which helps to send data to Elasticsearch.

After Logstash became popular, Rivers eventually got deprecated, as they made the cluster unstable and also performance issues were observed.

Logstash does not just ship data from one end to another; it helps us with collecting raw data and modifying/filtering it to convert it to something meaningful, formatted, and organized. The updated data is then sent to Elasticsearch. If there is no plugin available to support reading data from a specific source, writing the data to a location, or modifying it in your own way, Logstash is flexible enough to allow you to write your own plugins.

Simply put, Logstash is open source, highly flexible, rich with plugins and can read your data from your choice of location. It normalizes data as per your defined configurations, and sends it to a particular destination, as per the requirements.

We will be learning more about Logstash in Chapter 3, Exploring Logstash and Its Plugins and Chapter 7, Customizing Elastic Stack.

Elasticsearch

All of the data read by Logstash is sent to Elasticsearch for indexing. Elasticsearch is not only used to index data, it is also full-text search engine, highly scalable, distributed, and offers many more things too. Elasticsearch manages and maintains your data in the form of indices and offers you to query, access, and aggregate the data using its APIs. Elasticsearch is based on Lucene, thus providing you all of the features that Lucene does.

We will be learning more about Elasticsearch in Chapter 2, Stepping into Elasticsearch, Chapter 7, Customizing Elastic Stack, and Chapter 8, Elasticsearch APIs.

Kibana

Kibana uses Elasticsearch APIs to read/query data from Elasticsearch indices, to visualize and analyze in the form of charts, graphs and tables. Kibana is in the form of a web application, providing you with a highly configurable user interface that lets you query the data, create a number of charts to visualize, and make actual sense out of the data stored.

We will be learning more about Kibana in Chapter 4, Kibana Interface and Chapter 7, Customizing Elastic Stack.

After a robust ELK Stack, as time passed, a few important and complex demands took place, such as authentication, security, notifications, and so on. This demand led to the development of a few other tools such as Watcher (providing alerts and notifications based on changes in data), Shield (authentication and authorization for securing clusters), Marvel (monitoring statistics of the cluster), ES-Hadoop, Curator, and Graph, as requirements arose.