Book Image

Hands-On Infrastructure Monitoring with Prometheus

By : Joel Bastos, Pedro Araújo
Book Image

Hands-On Infrastructure Monitoring with Prometheus

By: Joel Bastos, Pedro Araújo

Overview of this book

Prometheus is an open source monitoring system. It provides a modern time series database, a robust query language, several metric visualization possibilities, and a reliable alerting solution for traditional and cloud-native infrastructure. This book covers the fundamental concepts of monitoring and explores Prometheus architecture, its data model, and how metric aggregation works. Multiple test environments are included to help explore different configuration scenarios, such as the use of various exporters and integrations. You’ll delve into PromQL, supported by several examples, and then apply that knowledge to alerting and recording rules, as well as how to test them. After that, alert routing with Alertmanager and creating visualizations with Grafana is thoroughly covered. In addition, this book covers several service discovery mechanisms and even provides an example of how to create your own. Finally, you’ll learn about Prometheus federation, cross-sharding aggregation, and also long-term storage with the help of Thanos. By the end of this book, you’ll be able to implement and scale Prometheus as a full monitoring system on-premises, in cloud environments, in standalone instances, or using container orchestration with Kubernetes.
Table of Contents (21 chapters)
Free Chapter
1
Section 1: Introduction
5
Section 2: Getting Started with Prometheus
11
Section 3: Dashboards and Alerts
15
Section 4: Scalability, Resilience, and Maintainability

Chapter 5, Running a Prometheus Server

  1. Then, scrape_timeout will be set to its default – 10 seconds.
  2. Besides restarting, the configuration file can be reloaded by either sending a SIGHUP signal to the Prometheus process or sending an HTTP POST request to the /-/reload endpoint if --web.enable-lifecycle is used at startup.
  3. Prometheus will look back up to five minutes by default, unless it finds a stale marker, in which case it will immediately consider the series stale.
  1. While relabel_configs is used to rewrite the target list before the scrape is performed, metric_relabel_configs is used to rewrite labels or drop samples after the scrape has occurred.
  2. As we're scraping through a Kubernetes service (which is similar in function to a load balancer), the scrapes will hit only a single instance of the Hey application at a time.
  3. Due to the ephemeral nature of Kubernetes...