Introduction
In the ElasticSearch ecosystem, it can be immensely useful to monitor nodes and clusters to manage and improve their performance and state. There are several issues that might arise at the cluster level, such as the following:
Node overheads: Some nodes might have too many shards allocated and become bottlenecks for the entire cluster
Node shutdown: This can happen due to many reasons—for example, full disks, hardware failures, and power problems
Shard relocation: Problems or corruptions related to shard relocation, due to which some shards are unable to get an online status
Very large shards: If a shard is too big, the index performance decreases due to the Lucene massive segments merging
Empty indices and shards: They waste memory and resources but, because every shard has a lot of active thread, if there is a huge number of unused indices and shards, general cluster performance is degraded
Detecting malfunction or bad performances can be done via the API or via some front-end plugins...