Stability and safety
Applications and components in a Kubernetes cluster can occasionally be prone to unexpected failures, such as network timeouts or code panics due to unforeseen bugs. It is part of an Operator's job to monitor for these spontaneous failures and attempt to recover from them. But of course, human error in the pursuit of adjusting the system can be another source of failure. As a result, any interaction with or modification of the core system components in Kubernetes brings inherent risk. This is elevated because manual adjustments to one component can contain errors (even minor ones) that cause a domino effect, as other components that depend on it begin reacting to the original error.
Perhaps the prime objective of an Operator is to provide stability and safety in production environments. Here, stability refers to the ongoing performant operation of the Operand programs, and safety is the ability of an Operator to sanitize and validate any inputs or modifications...