Software Architecture Patterns for Serverless Systems - Second Edition

By : John Gilbert

Software Architecture Patterns for Serverless Systems - Second Edition

By: John Gilbert

Overview of this book

Organizations undergoing digital transformation rely on IT professionals to design systems to keep up with the rate of change while maintaining stability. With this edition, enriched with more real-world examples, you’ll be perfectly equipped to architect the future for unparalleled innovation. This book guides through the architectural patterns that power enterprise-grade software systems while exploring key architectural elements (such as events-driven microservices, and micro frontends) and learning how to implement anti-fragile systems. First, you'll divide up a system and define boundaries so that your teams can work autonomously and accelerate innovation. You'll cover the low-level event and data patterns that support the entire architecture while getting up and running with the different autonomous service design patterns. This edition is tailored with several new topics on security, observability, and multi-regional deployment. It focuses on best practices for security, reliability, testability, observability, and performance. You'll be exploring the methodologies of continuous experimentation, deployment, and delivery before delving into some final thoughts on how to start making progress. By the end of this book, you'll be able to architect your own event-driven, serverless systems that are ready to adapt and change.

Preface

Who this book is for

What this book covers

To get the most out of this book

Architecting for Innovation

Continuously delivering business value

Dissecting lead time

Dissecting integration styles

Enabling autonomous teams with autonomous services

Summary

Free Chapter

Defining Boundaries and Letting Go

Learning the hard way

Building on proven concepts

Thinking about events first

Dividing a system into autonomous subsystems

Creating subsystem bulkheads

Dissecting an autonomous subsystem

Dissecting an autonomous service

Governing without impeding

Summary

Taming the Presentation Tier

Presentation tier innovation – zigzagging through time

Breaking up the frontend monolith

Dissecting micro frontends

Designing for offline-first

Summary

Trusting Facts and Eventual Consistency

Living in an eventually consistent world

Publishing to an event hub

Dissecting the Event Sourcing pattern

Event streams

Processing event streams

Designing for failure

Optimizing throughput

Summary

Turning the Cloud into the Database

Fighting data gravity

Embracing the data life cycle

Turning the database inside out

Dissecting the CQRS pattern

Keeping data lean

Implementing idempotence and order tolerance

Modeling data for operational performance

Leveraging change data capture

Summary

A Best Friend for the Frontend

Focusing on user activities

Dissecting the Backend for Frontend pattern

Dissecting function-level nano architecture

Choosing between REST and GraphQL

Implementing different kinds of BFF services

Summary

Bridging Intersystem Gaps

Creating an anti-corruption layer

Dissecting the External Service Gateway pattern

Integrating with third-party systems

Integrating with other subsystems

Integrating across cloud providers

Integrating with legacy systems

Providing an open API and SPI

Tackling common data challenges

Summary

Reacting to Events with More Events

Promoting inter-service collaboration

Dissecting the Control Service pattern

Orchestrating business processes

Employing the Saga pattern

Calculating event-sourcing snapshots

Implementing complex event processing (CEP) logic

Leveraging machine learning (ML) for control flow

Summary

Running in Multiple Regions

Justifying multi-regional deployment

Choosing a regional topology

Preparing for regional failover

Checking regional health

Configuring regional routing

Replicating across regions

Dissecting regional failover

Addressing intersystem differences

Implementing multi-regional cron jobs

Summary

Securing Autonomous Subsystems in Depth

Shared responsibility model

Securing cloud accounts

Securing CI/CD pipelines

Securing the perimeter

Securing the frontend

Securing BFF services

Redacting sensitive data

Securing ESG services

Auditing continuously

Summary

Choreographing Deployment and Delivery

Optimizing testing for continuous deployment

Focusing on risk mitigation

Achieving zero-downtime deployments

Planning at multiple levels

Turning the crank

Dissecting CI/CD pipelines

Summary

Optimizing Observability

Failing forward fast

Turning observability inside out

Leveraging FinOps

Collecting resource metrics

Tracking system events

Alerting on work metrics

Observing real user activity

Tuning continuously

Summary

Don’t Delay, Start Experimenting

Gaining trust and changing culture

Funding products, not projects

Dissecting the Strangler pattern

Addressing event-first concerns

Poly everything

Summary

Other Books You May Enjoy

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Thinking about events first

In the first chapter, we covered a brief history of software integration styles and the forces that impact lead times. We designed autonomous services to enable teams to maximize their pace of innovation because they give teams the confidence they need to minimize lead time and batch size. However, to deliver on this promise, we need to change the way we act, which means we need a different way of thinking.

We need to do the following:

Start with event storming
Focus on verbs instead of nouns
Treat events as facts instead of ephemeral messages
Turn APIs inside out by treating events as contracts
Invert responsibility for invocation
Connect services through an event hub

In other words, we need to think event-first. We can start to change our perspective by using a technique called event storming.

Start with event storming

Event storming is a workshop-oriented technique that helps teams discover the behavior of their business domain. It begins with brainstorming. The team starts by coalescing an initial set of domain events on a board using orange sticky notes. Next, we sequence the cards to depict the flow of events.

The following is a simplified example of the flow of events for a typical food delivery service. I will use and elaborate on this example throughout the book:

Figure 2.1 – Event storming—flow of events

Figure 2.2: Event storming – the flow of events

Along the way, the team will iteratively add more details using sticky notes of different colors, such as the following:

The command (blue) that performed the action that generated the event
The users (yellow) or external systems (pink) that invoked the command
The aggregate business domain (tan) whose state changed
The read-only data (green) that we need to support decision-making
Any policies (gray) that control behavior
The overall business process (purple) that is in play

Note that event storming is not a substitute for user stories and story mapping. User stories and story mapping are project management techniques for dividing work into manageable units and creating roadmaps. Event storming facilitates the discovery of user stories and the boundaries within our software architecture.

Focus on verbs instead of nouns

The flow of events discovered in the event-storming exercise clearly captures the behavior of the system. This event-first way of thinking is different because it zeroes in on the verbs instead of the nouns of the business domain. Conversely, more traditional approaches, such as object-oriented design, tend to focus on the nouns and create a class for each noun. However, when we focus on the nouns, we tend to create services that are resistant to change because they violate the SRP and ISP principles.

As an example, it is not uncommon to find a service whose single responsibility is everything to do with a single domain aggregate. These services will end up containing all the commands that operate on the data of the domain. However, as we discussed in the SOLID principles section, the SRP is intended to focus on the actors of the system. Different actors initiate different commands, which means that these noun-focused services ultimately serve many masters with competing demands. This will impede our ability to change these services when necessary.

Instead, we need to segregate the various commands across the different actors. By focusing on the verbs of the domain model, we are naturally drawn to creating services for the different actors that perform the actions. This eliminates the competing demands that add unnecessary complexity to the code and avoids coupling an actor to unneeded commands.

Of course, now that the actors are the focal point of our services, we will need a way to share the nouns (that is, domain aggregates) between services without increasing coupling. We need a record of truth. To address this, we first need to start thinking of events as facts, instead of just ephemeral messages.

Treat events as facts instead of ephemeral messages

Let’s recognize that when we think about events, we are focusing on the outputs of the system instead of the inputs. We are thinking in the past tense and thus we are focusing on the facts the system will produce over time. This is powerful in multiple ways.

It turns out we are implicitly building business analytics and observability characteristics into the system. For example, we can count the ViewedMenu events to track the popularity of the different restaurants and we can monitor the rate of PlacedOrder events to verify the health of the system.

We can also use this information to validate the hypothesis of each lean experiment we perform to help ensure we are building the right system and delivering on our business goals and objectives. In other words, event-first thinking facilitates observability mechanisms that help build team confidence and thus momentum.

However, to turn events into facts, we must treat them as first-class citizens instead of ephemeral messages. This is different from traditional messaging-based architectures, where we throw away the messages once we have processed them. We don’t want to treat events as ephemeral messages because we lose valuable information that we cannot easily recreate, if at all.

We will instead treat events as immutable facts and store them in an event lake in perpetuity. The event lake will act as the record of truth for the facts of the system. However, to make the record of truth complete we must think of events as contracts instead of mere notifications.

Turn APIs inside out by treating events as contracts

Many event-driven systems use events for notifications only. These anemic events only contain the identifier of the business domain entity that produced the event. Downstream services must retrieve the full data when they need it. This introduces coupling because it requires a synchronous call between the services. It may also create unwanted race conditions since the data can change before we retrieve it.

The usefulness of notification events as the record of truth is very limited because we will often refer to these facts far off in the future, well after the domain data has changed. To fully capture the facts, we need events to represent a snapshot in time of the state of the domain aggregate when the event occurred. This allows us to treat the facts as an audit log that is analogous to the transaction log of a database. This is a very powerful concept because a database uses the transaction log as the record of truth to manage the state and integrity of the database.

We are essentially turning the database inside out and creating a systemwide record of truth that we can leverage to manage the state and integrity of the entire system. For example, we can leverage the facts to transfer (that is, replicate or rebuild) the state of domain aggregates (that is, nouns) between services. This eliminates the need for aligning services around domain aggregates and results in an immensely scalable and resilient system.

This means that we are turning our APIs inside out by using events as the contracts between services. This also implies a guarantee of backward compatibility, and we will therefore create strong contracts between services within a subsystem and even stronger contracts between subsystems. At first glance, it may appear that this way of thinking will make the system more rigid. In reality, we are making the system more flexible and evolutionary by inverting responsibility to downstream services so they can react to events as they see fit.

Invert responsibility for invocation

The DIP, as we covered earlier in the chapter, was a major advancement in software design, because it decoupled high-level policy decisions from low-level dependency decisions. This gave teams the flexibility to substitute different implementations of the low-level components without breaking the logic in the high-level components. In other words, the DIP facilitated the use of the LSP and the OCP to make systems much more stable and flexible.

We elevated the DIP to the architectural level by using events as the abstraction (that is, contract) between autonomous services. This promotes the stability of the system when we modify a service because we are holding the contracts constant to control the scope and impact of any given deployment. But we gain more than just stability; we also gain flexibility. The use of events for inter-service communication gives rise to an inversion of responsibility that makes systems reactive. The best way to understand this improvement is to compare the old imperative approach to the new reactive approach.

The traditional, imperative approach to implementing systems is command focused. One component determines when to invoke another. For example, in our food delivery system, we would traditionally have the checkout functionality make a synchronous call to an order management service to invoke a command that submits the customer’s order. This means that we are coupling the checkout component to the presence of an order management service because it is responsible for the decision to invoke the command. This may not seem like a problem until we apply the same approach to retrieving driver status. We will need to invoke a service to retrieve driver status over and over again by any number of components and it will likely become a bottleneck.

Alternatively, we end up with a much more resilient and flexible system when we employ the reactive approach. The checkout component simply produces an OrderPlaced event. The order management service is now responsible for the decision to consume this event and react as it sees fit. The driver service simply produces DriverStatusChanged events when there is something useful to report. Any other service can take responsibility for reacting to driver events without impacting the driver service.

This inversion of responsibility is a key characteristic of autonomous services. It greatly reduces the complexity of the individual services because it reduces their responsibilities. A service is already aware of its own state, and it can simply produce events to reflect the changes without taking responsibility for what happens next. Downstream services take responsibility for how they react to upstream events. This completely decouples services from one another. They are all autonomous. This simplicity makes it much easier for teams to gauge the correctness and impact of any given change. Teams can be confident that the system will remain stable, if they uphold the contracts.

The reactive nature of event-first thinking is a paradigm shift, but the benefits are well worth the effort. A system becomes free to evolve in unforeseen ways by simply adding consumers. We can implement services in virtually any order because we can simulate upstream events and there is no coupling to downstream consumers. We gain the ability to create end-to-end test suites that don’t require other services to be running at the same time. The bottom line is that the reactive nature of autonomous services enables autonomous teams to react much more quickly to feedback as they learn from their experiments.

Connect services through an event hub

There is a myth that event-driven systems are much more complex, but this couldn’t be further from the truth. Event-first thinking allows us to create arbitrarily complex systems by connecting subsystems in a simple fractal topology as Figure 2.3 depicts:

Figure 2.3: Event-first topology

At the heart of our event-first architecture is the event hub. It connects everything together and pumps events through the system. Each subsystem has an event hub at the center, and each autonomous service connects to the event hub through well-defined ports (that is, events).

From a single service to many services in a subsystem, and from a single subsystem to many subsystems in a system, this simple pattern of connecting autonomous services and subsystems by producing and consuming events repeats ad infinitum. This flexibility frees us to build ever-evolving systems. We will dig into the details of the event hub in Chapter 4, Trusting Facts and Eventual Consistency, and we will see how to connect subsystems in Chapter 7, Bridging Intersystem Gaps.

Event-first is a very powerful approach but adopting this way of thinking can be a journey. Let’s continue that journey by learning how to divide a system into autonomous subsystems, then we will move on to the autonomous service patterns within each subsystem, and finally, we will dig into the anatomy of these services. Then we will be ready to bring all the details together throughout the remaining chapters.

Software Architecture Patterns for Serverless Systems - Second Edition

By : John Gilbert

Software Architecture Patterns for Serverless Systems - Second Edition

By: John Gilbert

Overview of this book

Related Content you might be interested in

Current Title:

Software Architecture Patterns for Serverless Systems - Second Edition

Cloud Native Development Patterns and Best Practices

JavaScript Cloud Native Development Cookbook

Embracing Microservices Design

Thinking about events first

Start with event storming

Focus on verbs instead of nouns

Treat events as facts instead of ephemeral messages

Turn APIs inside out by treating events as contracts

Invert responsibility for invocation

Connect services through an event hub