Understanding metrics in Datadog
The health of a software system and the infrastructure it is running on are defined by a set of metrics and their threshold values. For example, on the infrastructure side, if the CPU usage on a machine is under 70%, it might be considered healthy for a specific use case. When all such metrics that are used for monitoring an environment report values in the normal range, the entire environment can be considered healthy. By setting relevant thresholds for these metrics on monitors, issues can be reported as alerts. Datadog provides features to define metrics-based monitors and alerts.
We saw in Chapter 2, Deploying Datadog Agent, and Chapter 3, Datadog Dashboard, that published metrics can be viewed and filtered using tags in Metrics Explorer in the Datadog UI, as in the following example:
- Navigate to Metrics | Metrics Explorer to bring up the Metrics Explorer window:
- In the Graph field, enter the name...