Book Image

Datadog Cloud Monitoring Quick Start Guide

By : Thomas Kurian Theakanath
Book Image

Datadog Cloud Monitoring Quick Start Guide

By: Thomas Kurian Theakanath

Overview of this book

Datadog is an essential cloud monitoring and operational analytics tool which enables the monitoring of servers, virtual machines, containers, databases, third-party tools, and application services. IT and DevOps teams can easily leverage Datadog to monitor infrastructure and cloud services, and this book will show you how. The book starts by describing basic monitoring concepts and types of monitoring that are rolled out in a large-scale IT production engineering environment. Moving on, the book covers how standard monitoring features are implemented on the Datadog platform and how they can be rolled out in a real-world production environment. As you advance, you'll discover how Datadog is integrated with popular software components that are used to build cloud platforms. The book also provides details on how to use monitoring standards such as Java Management Extensions (JMX) and StatsD to extend the Datadog platform. Finally, you'll get to grips with monitoring fundamentals, learn how monitoring can be rolled out using Datadog proactively, and find out how to extend and customize the Datadog platform. By the end of this Datadog book, you will have gained the skills needed to monitor your cloud infrastructure and the software applications running on it using Datadog.
Table of Contents (19 chapters)
Section 1: Getting Started with Datadog
Section 2: Extending Datadog
Section 3: Advanced Monitoring

Proactive monitoring

Technically, monitoring is not part of a software system running in production. The applications in a software system can run without any monitoring tools rolled out. As a best practice, software applications must be decoupled from monitoring tools anyway.

This scenario sometimes results in taking software systems to production with minimal or no monitoring, which would eventually result in issues going unnoticed, or, in the worst-case scenario, users of those systems discovering those issues while using the software service. Such situations are not good for the business due to these reasons:

  • An issue in production impacts the business continuity of the customers and, usually, there would be a financial cost associated with it.
  • Unscheduled downtime of a software service would leave a negative impression on the users about the software service and its provider.
  • Unplanned downtime usually creates chaos at the business level and triaging and resolving such issues can be stressful for everyone involved and expensive to the businesses impacted by it.

One of the mitigating steps taken in response to a production issue is adding some monitoring so the same issue will be caught and reported to the operations team. Usually, such a reactive approach increases the coverage of monitoring organically, but not following a monitoring strategy. While such an approach will help to catch issues sometimes, there is no guarantee that an organically grown monitoring infrastructure would be capable of checking the health of the software system and warning about it, so remediation steps can be taken proactively to minimize outages in the future.

Proactive monitoring refers to rolling out monitoring solutions for a software system to report on issues with the components of the software system, and the infrastructure the system runs on. Such reporting can help with averting an impending issue by taking mitigating steps manually or automatically. The latter method is usually called self-healing, a highly desirable end state of monitoring, but hard to implement.

The key aspects of a proactive monitoring strategy are as follows.

Implementing a comprehensive monitoring solution

Traditionally, the focus of monitoring has been the infrastructure components – compute, storage, and network. As you will see later in this chapter, there are more aspects of monitoring that would make the list complete. All relevant types of monitoring have to be implemented for a software system so issues with any component, software, or infrastructure would be caught and reported.

Setting up alerts to warn of impending issues

The monitoring solution must be designed to warn of impending issues with the software system. This is easy with infrastructure components as it is easy to track metrics such as memory usage, CPU utilization, and disk space, and alert on any usage over the limits.

However, such a requirement would be tricky at the application level. Sometimes applications can fail on perfectly configured infrastructure. To mitigate that, software applications should provide insights into what is going under the hood. In monitoring jargon, it is called observability these days and we will see later in the book how that can be implemented in Datadog.

Having a feedback loop

A mature monitoring system warning of impending issues that would help to take mitigation steps is not good enough. Such warnings must also be used to resolve issues automatically (for example, spinning off a new virtual machine with enough disk space when an existing virtual host runs out of disk space), or be fed into the redesigning of the application or infrastructure to avoid the issue from happening in the future.