Before we start instrumenting our services, we should discuss services we're deploying. They can be divided into three categories: online services, offline services, and batch processes. While there is overlap between each of those types and it is often not that easy to place a service into only one of them, such a division will provide us with a good understanding of the types of metrics we should implement.
We can define online services as those that accept requests from another service, a human, or a client (for example, browser). Those who send requests to online services often expect an immediate response. Front-end, APIs, and databases are only a few of the examples of such services. Due to the expectations we have from them, the key metrics we are interested in are the number of requests they served, the number of errors...