Software Architecture Patterns for Serverless Systems - Second Edition

By : John Gilbert

Software Architecture Patterns for Serverless Systems - Second Edition

By: John Gilbert

Overview of this book

Organizations undergoing digital transformation rely on IT professionals to design systems to keep up with the rate of change while maintaining stability. With this edition, enriched with more real-world examples, you’ll be perfectly equipped to architect the future for unparalleled innovation. This book guides through the architectural patterns that power enterprise-grade software systems while exploring key architectural elements (such as events-driven microservices, and micro frontends) and learning how to implement anti-fragile systems. First, you'll divide up a system and define boundaries so that your teams can work autonomously and accelerate innovation. You'll cover the low-level event and data patterns that support the entire architecture while getting up and running with the different autonomous service design patterns. This edition is tailored with several new topics on security, observability, and multi-regional deployment. It focuses on best practices for security, reliability, testability, observability, and performance. You'll be exploring the methodologies of continuous experimentation, deployment, and delivery before delving into some final thoughts on how to start making progress. By the end of this book, you'll be able to architect your own event-driven, serverless systems that are ready to adapt and change.

Preface

Who this book is for

What this book covers

To get the most out of this book

Architecting for Innovation

Continuously delivering business value

Dissecting lead time

Dissecting integration styles

Enabling autonomous teams with autonomous services

Summary

Free Chapter

Defining Boundaries and Letting Go

Learning the hard way

Building on proven concepts

Thinking about events first

Dividing a system into autonomous subsystems

Creating subsystem bulkheads

Dissecting an autonomous subsystem

Dissecting an autonomous service

Governing without impeding

Summary

Taming the Presentation Tier

Presentation tier innovation – zigzagging through time

Breaking up the frontend monolith

Dissecting micro frontends

Designing for offline-first

Summary

Trusting Facts and Eventual Consistency

Living in an eventually consistent world

Publishing to an event hub

Dissecting the Event Sourcing pattern

Event streams

Processing event streams

Designing for failure

Optimizing throughput

Summary

Turning the Cloud into the Database

Fighting data gravity

Embracing the data life cycle

Turning the database inside out

Dissecting the CQRS pattern

Keeping data lean

Implementing idempotence and order tolerance

Modeling data for operational performance

Leveraging change data capture

Summary

A Best Friend for the Frontend

Focusing on user activities

Dissecting the Backend for Frontend pattern

Dissecting function-level nano architecture

Choosing between REST and GraphQL

Implementing different kinds of BFF services

Summary

Bridging Intersystem Gaps

Creating an anti-corruption layer

Dissecting the External Service Gateway pattern

Integrating with third-party systems

Integrating with other subsystems

Integrating across cloud providers

Integrating with legacy systems

Providing an open API and SPI

Tackling common data challenges

Summary

Reacting to Events with More Events

Promoting inter-service collaboration

Dissecting the Control Service pattern

Orchestrating business processes

Employing the Saga pattern

Calculating event-sourcing snapshots

Implementing complex event processing (CEP) logic

Leveraging machine learning (ML) for control flow

Summary

Running in Multiple Regions

Justifying multi-regional deployment

Choosing a regional topology

Preparing for regional failover

Checking regional health

Configuring regional routing

Replicating across regions

Dissecting regional failover

Addressing intersystem differences

Implementing multi-regional cron jobs

Summary

Securing Autonomous Subsystems in Depth

Shared responsibility model

Securing cloud accounts

Securing CI/CD pipelines

Securing the perimeter

Securing the frontend

Securing BFF services

Redacting sensitive data

Securing ESG services

Auditing continuously

Summary

Choreographing Deployment and Delivery

Optimizing testing for continuous deployment

Focusing on risk mitigation

Achieving zero-downtime deployments

Planning at multiple levels

Turning the crank

Dissecting CI/CD pipelines

Summary

Optimizing Observability

Failing forward fast

Turning observability inside out

Leveraging FinOps

Collecting resource metrics

Tracking system events

Alerting on work metrics

Observing real user activity

Tuning continuously

Summary

Don’t Delay, Start Experimenting

Gaining trust and changing culture

Funding products, not projects

Dissecting the Strangler pattern

Addressing event-first concerns

Poly everything

Summary

Other Books You May Enjoy

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Creating subsystem bulkheads

What is the most critical subsystem in your system? This is an interesting question. Certainly, they are all important, but some are more important. For many businesses, the customer-facing portion of the system is the most important. After all, we must be able to engage with the customers to provide a service for them. For example, in an e-commerce system, the customer must be able to access the catalog and place orders, whereas the authoring of new offers is less crucial. So, we want to protect the critical subsystems from the rest of the system.

Our aim is to fortify all the architectural boundaries in a system so that autonomous teams can forge ahead with experiments, confident in the knowledge that the blast radius will be contained when teams make mistakes. At the subsystem level, we are essentially enabling autonomous organizations to manage their autonomous subsystems independently. Let’s look at how we can fortify our autonomous subsystems with bulkheads.

Separate cloud accounts

Cloud accounts form natural bulkheads that we should leverage as much as possible to help protect us from ourselves. Far too often we overload our cloud accounts with too many unrelated workloads, which puts all the workloads at risk. At a bare minimum, development and production environments must be in separate accounts. But we can do better by having separate accounts, per subsystem, per environment. This will help control the blast radius when there is a failure in one account. Here are some of the natural benefits of using multiple accounts:

We control the technical debt that naturally accumulates as the number of resources within an account grows. It becomes difficult, if not impossible, to see the forest for the trees, so to speak, when we put too many workloads in one account. The learning curve increases because the account is not clean. Tagging resources helps, but they are prone to omission. Engineers eventually resist making any changes because the risk of making a mistake is too high. The likelihood of a catastrophic system failure also increases.
We improve our security posture by limiting the attack surface of each account. Restricting access is as simple as assigning team members to the right role in the right account. If a breach does occur, then access is limited to the resources in that one account. In the case of legacy systems, we can minimize the number of accounts that have access to the corporate network, preferably to just one per environment.
We have less competition for limited resources. Many cloud resources have soft limits at the account level that throttle access when transaction volumes exceed a threshold. The likelihood of hitting these limits increases as the number of workloads increases. A denial-of-service attack or a runaway mistake on one workload could starve all other workloads. We can request increases to these limits, but this takes time. Instead, the default limits may provide plenty of headroom once we allocate accounts for individual subsystems.
Cost allocation is simple and error resistant because we allocate everything in an account to a single cost bucket without the need for tagging. This means that no unallocated costs occur when tagging is incomplete. We also minimize the hidden costs of orphaned resources because they are easier to identify.
Observability and governance are more accurate and informative because monitoring tools tag all metrics by account. This allows filtering and alerting by subsystem. When failures do occur, the limited blast radius also facilitates root cause analysis and a shorter mean time to recovery.

Having multiple accounts means that there are cross-cutting capabilities that we must duplicate across accounts, but we will address this shortly when we discuss automation in the Governing without impeding section.

External domain events

We have already discussed the benefits of using events as contracts, the importance of backward compatibility, and how asynchronous communication via events creates a bulkhead along our architectural boundaries. Now we need to look at the distinction between internal and external domain events.

Within a subsystem, its services will communicate via internal domain events. The definitions of these events are relatively easy to change because the autonomous teams that own the services work together in the same autonomous organization. The event definitions will start out messy but will quickly evolve and stabilize as the subsystem matures. We will leverage the Robustness principle to facilitate this change. The events will also contain raw information that we want to retain for auditing purposes but that is of no importance outside of the subsystem. All of this is OK because it is all in the family, so to speak.

Conversely, across subsystem boundaries, we need more regulated team communication and coordination to facilitate changes to these contracts. As we have seen, this communication increases lead time, which is the opposite of what we want. We want to limit the impact that this has on internal lead time, so we are free to innovate within our autonomous subsystems. We essentially want to hide internal information and not air our dirty laundry in public.

Instead, we will perform all inter-subsystem communication via external domain events (sometimes called integration events). These external events will have much more stable contracts with stronger backward compatibility requirements. We will intend for these contracts to change slowly to help create a bulkhead between subsystems. Domain-Driven Design (DDD) refers to this technique as context mapping, such as when we use domain aggregates with the same terms in multiple bounded contexts, but with different meanings.

External events represent the subsystem’s ports in hexagonal terminology. In Chapter 7, Bridging Intersystem Gaps, we will cover the External Service Gateway (ESG) pattern. Each subsystem will treat related subsystems as external systems. We will bridge the internal event hubs of related subsystems to create the event-first topology depicted in Figure 2.3. Each subsystem will define egress gateways that define what events it is willing to share and hide everything else. Subsystems will define ingress gateways that act as an anti-corruption layer to consume upstream external domain events and transform (that is, adapt) them to its internal formats.

Now that we have an all-important subsystem architecture with proper bulkheads, let’s look at the architecture within a subsystem. Let’s see how we can decompose an autonomous subsystem into autonomous services.

Software Architecture Patterns for Serverless Systems - Second Edition

By : John Gilbert

Software Architecture Patterns for Serverless Systems - Second Edition

By: John Gilbert

Overview of this book

Related Content you might be interested in

Current Title:

Software Architecture Patterns for Serverless Systems - Second Edition

Cloud Native Development Patterns and Best Practices

JavaScript Cloud Native Development Cookbook

Embracing Microservices Design

Creating subsystem bulkheads

Separate cloud accounts

External domain events