Stopping cascading failures with Hystrix
Failures in a complex system can be hard to diagnose. Often, the symptom can appear far away from the cause. Users might start experiencing higher-than-normal error rates during login because of some downstream service that manages profile pictures or something else tangentially related to user profiles. An error in one service can often propagate needlessly to a user request and adversely impact user experience and therefore trust in your application. Additionally, a failing service can have cascading effects, turning a small system outage into a high-severity, customer-impacting incident. It's important when designing microservices to consider failure isolation and decide how you want to handle different failure scenarios.
A number of patterns can be used to improve the resiliency of distributed systems. Circuit breakers are a common pattern used to back off from making requests to a temporarily overwhelmed service. Circuit breakers were first described...