Subsystems, and systems in a distributed system, have a high level of independence. A message sent by one subsystem occurs asynchronously. The send of a message does not occur at or near the receipt of that message on the other end. If the intended receiver of that message was unable to receive that message, it could be an astoundingly lengthy amount of time (as in seconds) before the loss of the message is detected. However, when you are looking at even 500 transactions per second throughput, a few seconds is a lot of data to wade through to figure out when the problem occurred and with what data.
Logging is vital to being able to track down problems in a distributed system.
There are logging frameworks and tools that do all sorts of different types of logging, and abstracts away the details of different types of logs. I recommend using some sort of logging framework to abstract where you want to log, from the fact that you are logging. For example, you may want to sometimes log...