Introduction to observability and monitoring
Programs can fail at any time. These failures might be due to issues with the machines that we used to build the infrastructure, or they may be due to human error. Machines are reliable to some extent, but there can be hardware failures while running the program. On the other hand, programmers may also create code-faulty programs or unresponsive programs due to human error. In either case, the system should be able to recover from failure and continue to operate. To achieve this, the system should be observable to find issues. In this section, we will discuss the techniques that we can use to observe and monitor a distributed system.
Observability versus monitoring
Observability and monitoring are common terms that you will have heard in programming. The concept of observability and monitoring is not only applicable to cloud native applications but also an important part of developing any software system. A simple statement to understand...