Maintaining quorums
In our previous examples, we mostly worked with a single-node manager but if you want resilience, you must ensure that there are minimal points of failure that will take your whole infrastructure down and a single orchestration management node is absolutely not enough for production services regardless of whether you use Swarm, Kubernetes, Marathon, or something else as your orchestration tooling. From the best practices perspective, you would want to have at least three or more management nodes in your cluster that are spread across three or more of your cloud's Availability Zones (AZ) or equivalent grouping to really ensure stability at scales since data center outages have been known to happen and have caused serious issues to companies that did not mitigate these types of circumstances.
While in most orchestration platforms you can have any number of backing management nodes (or backing key-value stores in some cases), you will always have to balance resiliency vs...