-
Book Overview & Buying
-
Table Of Contents
Software Architecture Patterns for Serverless Systems - Second Edition
By :
What is the most critical subsystem in your system? This is an interesting question. Certainly, they are all important, but some are more important. For many businesses, the customer-facing portion of the system is the most important. After all, we must be able to engage with the customers to provide a service for them. For example, in an e-commerce system, the customer must be able to access the catalog and place orders, whereas the authoring of new offers is less crucial. So, we want to protect the critical subsystems from the rest of the system.
Our aim is to fortify all the architectural boundaries in a system so that autonomous teams can forge ahead with experiments, confident in the knowledge that the blast radius will be contained when teams make mistakes. At the subsystem level, we are essentially enabling autonomous organizations to manage their autonomous subsystems independently. Let’s look at how we can fortify our autonomous subsystems with bulkheads.
Cloud accounts form natural bulkheads that we should leverage as much as possible to help protect us from ourselves. Far too often we overload our cloud accounts with too many unrelated workloads, which puts all the workloads at risk. At a bare minimum, development and production environments must be in separate accounts. But we can do better by having separate accounts, per subsystem, per environment. This will help control the blast radius when there is a failure in one account. Here are some of the natural benefits of using multiple accounts:
Having multiple accounts means that there are cross-cutting capabilities that we must duplicate across accounts, but we will address this shortly when we discuss automation in the Governing without impeding section.
We have already discussed the benefits of using events as contracts, the importance of backward compatibility, and how asynchronous communication via events creates a bulkhead along our architectural boundaries. Now we need to look at the distinction between internal and external domain events.
Within a subsystem, its services will communicate via internal domain events. The definitions of these events are relatively easy to change because the autonomous teams that own the services work together in the same autonomous organization. The event definitions will start out messy but will quickly evolve and stabilize as the subsystem matures. We will leverage the Robustness principle to facilitate this change. The events will also contain raw information that we want to retain for auditing purposes but that is of no importance outside of the subsystem. All of this is OK because it is all in the family, so to speak.
Conversely, across subsystem boundaries, we need more regulated team communication and coordination to facilitate changes to these contracts. As we have seen, this communication increases lead time, which is the opposite of what we want. We want to limit the impact that this has on internal lead time, so we are free to innovate within our autonomous subsystems. We essentially want to hide internal information and not air our dirty laundry in public.
Instead, we will perform all inter-subsystem communication via external domain events (sometimes called integration events). These external events will have much more stable contracts with stronger backward compatibility requirements. We will intend for these contracts to change slowly to help create a bulkhead between subsystems. Domain-Driven Design (DDD) refers to this technique as context mapping, such as when we use domain aggregates with the same terms in multiple bounded contexts, but with different meanings.
External events represent the subsystem’s ports in hexagonal terminology. In Chapter 7, Bridging Intersystem Gaps, we will cover the External Service Gateway (ESG) pattern. Each subsystem will treat related subsystems as external systems. We will bridge the internal event hubs of related subsystems to create the event-first topology depicted in Figure 2.3. Each subsystem will define egress gateways that define what events it is willing to share and hide everything else. Subsystems will define ingress gateways that act as an anti-corruption layer to consume upstream external domain events and transform (that is, adapt) them to its internal formats.
Now that we have an all-important subsystem architecture with proper bulkheads, let’s look at the architecture within a subsystem. Let’s see how we can decompose an autonomous subsystem into autonomous services.