Book Image

Practical Site Reliability Engineering

By : Pethuru Raj Chelliah, Shreyash Naithani, Shailender Singh
Book Image

Practical Site Reliability Engineering

By: Pethuru Raj Chelliah, Shreyash Naithani, Shailender Singh

Overview of this book

Site reliability engineering (SRE) is being touted as the most competent paradigm in establishing and ensuring next-generation high-quality software solutions. This book starts by introducing you to the SRE paradigm and covers the need for highly reliable IT platforms and infrastructures. As you make your way through the next set of chapters, you will learn to develop microservices using Spring Boot and make use of RESTful frameworks. You will also learn about GitHub for deployment, containerization, and Docker containers. Practical Site Reliability Engineering teaches you to set up and sustain containerized cloud environments, and also covers architectural and design patterns and reliability implementation techniques such as reactive programming, and languages such as Ballerina and Rust. In the concluding chapters, you will get well-versed with service mesh solutions such as Istio and Linkerd, and understand service resilience test practices, API gateways, and edge/fog computing. By the end of this book, you will have gained experience on working with SRE concepts and be able to deliver highly reliable apps and services.
Table of Contents (19 chapters)
Title Page
Dedication
About Packt
Contributors
Preface
10
Containers, Kubernetes, and Istio Monitoring
Index

The need for highly reliable platforms and infrastructures 


We discussed about cloud-enabled and native applications and how they are hosted on underlying cloud infrastructures to accomplish service delivery. Applications are significantly functional. However, the non-functional requirements, such as application scalability, availability, security, reliability, performance/throughput, modifiability, and so on, are being used widely. That is, producing high-quality applications is a real challenge for IT professionals. There are design, development, testing, and deployment techniques, tips, and patterns to incorporate the various NFRs into cloud applications. There are best practices and key guidelines to come out with highly scalable, available, and reliable applications.

The second challenge is to setup and sustain highly competent and cognitive cloud infrastructures to exhibit reliable behavior. The combination of highly resilient, robust, and versatile applications and infrastructures leads to the implementation of highly dependable IT that meets the business productivity, affordability, and adaptivity.

Having understood the tactical and strategic significance and value, businesses are consciously embracing the pioneering cloud paradigm. That is, all kinds of traditional IT environments are becoming cloud-enabled to reap the originally expressed business, technical, and use benefits. However, the cloud formation alone is not going to solve every business and IT problem. Besides establishing purpose-specific and agnostic cloud centers, there are a lot more things to be done to attain the business agility and reliability. The cloud center operation processes need to be refined, integrated, and orchestrated to arrive at optimized and organized processes. Each of the cloud center operations needs to be precisely defined and automated in to fulfil the true meaning of IT agility. With agile and reliable cloud applications and environments, the business competency and value are bound to go up remarkably.

The need for reliable software

We know that the subject of software reliability is a crucial one for the continued success of software engineering in the ensuing digital era. However, it is not easy thing to do. Because of the rising complexity of software suites, ensuring high reliability turns out to be a tough and time-consuming affair. Experts, evangelists, and exponents have come out with a few interesting and inspiring ideas for accomplishing reliable software systems. Primarily, there are two principal approaches; these are as follows:

  • Resilient microservices can lead to the realization of reliable software applications. Popular technologies include microservices, containers, Kubernetes, Terraform, API Gateway and Management Suite, Istio, and Spinnaker.
  • Reactive systems (resilient, responsive, message-driven, and elastic)—this is based on the famous Reactive Manifesto. There are a few specific languages and platforms (http://vertx.io/, http://reactivex.io/, https://www.lightbend.com/products/reactive-platform, RxJava, play framework, and so on) for producing reactive systems. vAkka is a toolkit for building highly concurrent, distributed, and resilient message-driven applications for Java and Scala.

Here are the other aspects being considered for producing reliable software packages:

  • Verification and validation of software reliability through various testing methods
  • Software reliability prediction algorithms and approaches
  • Static and dynamic code analysis methods
  • Patterns, processes, platforms, and practices for building reliable software packages

Let's discuss these in detail.

The emergence of microservices architecture 

Mission critical and versatile applications are to be built using the highly popular MSA pattern. Monolithic applications are being consciously dismantled using the MSA paradigm to be immensely right and relevant for their users and owners. Microservices are the new building block for constructing next-generation applications. Microservices are easily manageable, independently deployable, horizontally scalable, relatively simple services. Microservices are publicly discoverable, network accessible, interoperable, API-driven, composed, replaceable, and highly isolated.

The future software development is primarily finding appropriate microservices. Here are few advantages of the MSA style:

  • Scalability: Any production-grade application typically can use three types of scaling. The x-axis scaling is for horizontally scalability. That is, the application has to be cloned to guarantee high availability. The second type of scale is y-axis scaling. This is for splitting the application into various application functionalities. With microservices architecture, applications (legacy, monolithic, and massive) are partitioned into a collection of easily manageable microservices. Each unit fulfils one responsibility. The third is the z-axis scaling, which is for partitioning or sharding the data. The database plays a vital role in shaping up dynamic applications. With NoSQL databases, the concept of sharing came into prominence. 
  • Availability: Multiple instances of microservices are deployed in different containers (Docker) to guarantee high availability. Through this redundancy, the service and application availability is ensured. With multiple instances of services are being hosted and run through Docker containers, the load-balancing of service instances is utilized to ensure the high-availability of services. The widely used circuit breaker pattern is used to accomplish the much-needed fault-tolerance. That is, the redundancy of services through instances ensures high availability, whereas the circuit-breaker pattern guarantees the resiliency of services. Service registry, discovery, and configuration capabilities are to lead the development and discovery of newer services to bring forth additional business (vertical) and IT (horizontal) services. With services forming a dynamic and ad hoc service meshes, the days of service communication, collaboration, corroborations, and correlations are not too far away.  
  • Continuous deployment: Microservices are independently deployable, horizontally scalable, and self-defined. Microservices are decoupled/lightly coupled and cohesive fulfilling the elusive mandate of modularity. The dependency imposed issues get nullified by embracing this architectural style. This leads to the deployment of any service independent of one another for faster and more continuous deployment.
  • Loose coupling: As indicated previously, microservices are autonomous and independent by innately providing the much-needed loose coupling. Every microservice has its own layered- architecture at the service level and its own database at the backend.
  • Polyglot microservices: Microservices can be implemented through a variety of programming languages. As such, there is no technology lock-in. Any technology can be used to realize microservices. Similarly, there is no compulsion for using certain databases. Microservices work with any file system SQL databases, NoSQL and NewSQL databases, search engines, and so on. 
  • Performance: There are performance engineering and enhancement techniques and tips in the microservices arena. For example, high-blocking calls services are implemented in the single threaded technology stack, whereas high CPU usage services are implemented using multiple threads.

There are other benefits for business and IT teams by employing the fast-maturing and stabilizing microservices architecture. The tool ecosystem is on the climb, and hence implementing and involving microservices gets simplified and streamlined. Automated tools ease and speed up building and operationalizing microservices.  

Docker enabled containerization

The Docker idea has shaken the software world. A bevy of hitherto-unknown advancements are being realized through containerization. The software portability requirement, which has been lingering for a long time, gets solved through the open source Docker platform. The real-time elasticity of Docker containers hosting a variety of microservices enabling the real-time scalability of business-critical software applications is being touted as the key factor and facet for the surging popularity of containerization. The intersection of microservices and Docker containers domains has brought in paradigm shifts for software developers, as well as for system administrators. The lightweight nature of Docker containers along with the standardized packaging format in association with the Docker platform goes a long way in stabilizing and speeding up software deployment.

The container is a way to package software along with configuration files, dependencies, and binaries required to enable the software on any operating environment. There are a number of crucial advantages; they are as follows:

  • Environment consistency: Applications/processes/microservices running on containers behave consistently in different environments (development, testing, staging, replica, and production). This eliminates any kind of environmental inconsistencies and makes testing and debugging less cumbersome and less time-consuming.
  • Faster deployment: A container is lightweight and starts and stops in a few seconds, as it is not required to boot any OS image. This eventually helps to achieve faster creation, deployment, and high availability.
  • Isolation: Containers running on the same machine using the same resources are isolated from one another. When we start a container with the docker run command, the Docker platform does a few interesting things behind the scenes. That is, Docker creates a set of namespaces and control groups for the container. The namespaces and control groups (cgroups) are the kernel-level capabilities. The role of the namespaces feature is to provide the required isolation for the recently created container from other containers running in the host. Also, containers are clearly segregated from the Docker host. This separation does a lot of good for containers in the form of safety and security. Also, this unique separation ensures that any malware, virus, or any phishing attack on one container does not propagate to other running containers. In short, processes running within a container cannot see and affect processes running in another container or in the host system. Also, as we are moving toward a multi-container applications era, each container has to have its own network stack for container networking and communication. With this network separation, containers don't get any sort of privileged access to the sockets or interfaces of other containers in the same Docker host or across it. The network interface is the only way for containers to interact with one another as well as with the host. Furthermore, when we specify public ports for containers, the IP traffic is allowed between containers. They can ping one another, send and receive UDP packets, and establish TCP connections.
  • Portability: Containers can run everywhere. They can run in our laptop, enterprise servers, and cloud servers. That is, the long-standing goal of write once and run everywhere is getting fulfilled through the containerization movement. 

There are other important advantages of containerization. There are products and platforms that facilitate the cool convergence of containerization and virtualization to cater for emerging IT needs.

Containerized microservices

One paradigm shift in the IT space in the recent past is the emergence of containers for deftly hosting and running microservices. Because of the lightweight nature of containers, provisioning containers is done at lightning speed. Also, the horizontal scalability of microservices gets performed easily by their hosting environments (containers). Thus, this combination of microservices and containers brings a number of benefits for software development and IT operations. There can be hundreds of containers in a single physical machine.

The celebrated linkage helps to have multiple instances of microservices in a machine. With containers talking to one another across Docker hosts, multiple microservice instances can find one another to compose bigger and better composite services that are business and process-aware. Thus, all the advancements in the containerization space have a direct and indirect impacts on microservices engineering, management, governance, security, orchestration, and science.

The key technology drivers of containerized cloud environments are as follows:

  • The faster maturity and stability of containers (application and data).
  • New types of containers such as Kata Containers and HyperContainers.
  • MSA emerging as the most optimized architectural style for enterprise-scale applications.
  • There is a cool convergence between containers and microservices. Containers are the most optimized hosting and execution of runtime for microservices.
  • Web/cloud, mobile, wearable and IoT applications, platforms, middleware, UI, operational, analytical, and transactional applications are modernized as cloud-enabled applications, and the greenfield applications are built as cloud-native applications.
  • The surging popularity of Kubernetes as the container clustering, orchestration, and management platform solution leads to the realization of containerized clouds.
  • The emergence of API gateways simplifies and streamlines the access and usage of microservices collectively.
  • The faster maturity and stability of service mesh solutions ensures the resiliency of microservices and the reliability of cloud-hosted applications.

The challenges of containerized cloud environments are as follows:

  • Moving from monoliths to microservices is not an easy transition.
  • There may be thousands of microservices and their instances (redundancy) in a cloud environment.
  • For crafting an application, the data and control flows ought to pass through different and distributed microservices spread across multiple cloud centers.
  • The best practice says that there is a one to one mapping between microservice instances and containers. That is, separate containers are being allocated for separate microservice instances.
  • Due to the resulting dense environments, the operational and management complexities of containerized clouds are bound to escalate.
  • Tracking and tracing service request messages and events among microservices turn out to be a complex affair.
  • Troubleshooting and doing root cause analyses in microservices environments become a tough assignment.
  • Container life cycle management functionalities have to be automated.
  • Client-to-microservice (north-to-south traffic) communication remains a challenge.
  • Service-to-service (east-to-west traffic) communication has to be made resilient and robust.

Kubernetes for container orchestration

A MSA requires the creating and clubbing together of several fine-grained and easily manageable services that are lightweight, independently deployable, horizontally scalable, extremely portable, and so on. Containers provides an ideal hosting and run time environment for the accelerated building, packaging, shipping, deployment, and delivery of microservices. Other benefits include workload isolation and automated life-cycle management. With a greater number of containers (microservices and their instances) being stuffed into every physical machine, the operational and management complexities of containerized cloud environments are on the higher side. Also, the number of multi-container applications is increasing quickly. Thus, we need a standardized orchestration platform along with container cluster management capability. Kubernetes is the popular container cluster manager, and it consists of several architectural components, including pods, labels, replication controllers, and services. Let's take a look at them:

  • As mentioned elsewhere, there are several important ingredients in the Kubernetes architecture. Pods are the most visible, viable, and ephemeral units that comprise one or more tightly coupled containers. That means containers within a pod sail and sink together. There is no possibility of monitoring, measuring, and managing individual containers within a pod. In other words, pods are the base unit of operation for Kubernetes. Kubernetes does not operate at the level of containers. There can be multiple pods in a single server node and data sharing easily happens in between pods. Kubernetes automatically provision and allocate pods for various services. Each pod has its own IP address and shares the localhost and volumes. Based on the faults and failures, additional pods can be quickly provisioned and scheduled to ensure the continuity of services. Similarly, under heightened loads, Kubernetes adds additional resources in the form of pods to ensure system and service performance.  Depending on the traffic, resources can be added and removed to fulfil the goal of elasticity.
  • Labels are typically the metadata that is attached to objects, including pods.
  • Replication controllers, as articulated previously, have the capability to create new pods leveraging a pod template. That is, as per the configuration, Kubernetes is able to run the sufficient number of pods at any point in time. Replication controllers accomplish this unique demand by continuously polling the container cluster. If there is any pod going down, this controller software immediately jumps into action to incorporate an additional pod to ensure that the specified number of pods with a given set of labels are running within the container cluster.
  • Services is another capability that embedded into Kubernetes architecture. This functionality and facility offers a low-overhead way to route all kinds of service requests to a set of pods to accomplish the requests. Labels is the way forward for selecting the most appropriate pods. Services provide methods to externalize legacy components, such as databases, with a cluster. They also provide stable endpoints as clusters shrink and grow and become configured and reconfigured across new nodes within the cluster manager. Their job is to remove the pain of keeping track of application components that exist within a cluster instance.

The fast proliferation of application and data containers in producing composite services is facilitated through the leveraging of Kubernetes, and it fastening the era of containerization. Both traditional and modern IT environments are embracing this compartmentalization technology to surmount some of the crucial challenges and concerns of the virtualization technology.

API Gateways and management suite: This is another platform for bringing in reliable client and service interactions. The various features and functionalities of API tools include the following:

  • It acts as a router. It is the only entry point to our collection of microservices. This way, microservices are not needed to be public anymore but are behind an internal network. An API Gateway is responsible for making requests against a service or another one (service discovery).
  • It acts as a data aggregator. API Gateway fetches data from several services and aggregates it to return a single rich response. Depending on the API consumer, data representation may change according to the needs, and here is where backend for frontend (BFF) comes into play.
  • It is a protocol abstraction layer. The API Gateway can be exposed as a REST API or a GraphQL or whatever, no matter what protocol or technology is being used internally to communicate with the microservices.
  • Error management is centralized. When a service is not available, is getting too slow, and so on, the API Gateway can provide data from the cache, default responses or make smart decisions to avoid bottlenecks or fatal errors propagation. This keeps the circuit closed (circuit breaker) and makes the system more resilient and reliable.
  • The granularity of APIs provided by microservices is often different than what a client needs. Microservices typically provide fine-grained APIs, which means that clients need to interact with multiple services. The API Gateway can combine these multiple fine-grained services into a single combined API that clients can use, thereby simplifying the client application and improving performance. 
  • Network performance is different for different types of clients. The API Gateway can define device-specific APIs that reduce the number of calls required to be made over slower WAN or mobile networks. The API Gateway being a server-side application makes it more efficient to make multiple calls to backend services over LAN.
  • The number of service instances and their locations (host and port) changes dynamically. The API Gateway can incorporate these backend changes without requiring frontend client applications by determining backend service locations. 
  • Different clients may need different levels of security. For example, external applications may need a higher level of security to access the same APIs that internal applications may access without the additional security layer.

Service mesh solutions for microservice resiliency: Distributed computing is the way forward for running web-scale applications and big-data analytics. By the horizontal scalability and individual life cycle of management of various application modules (microservices) of customer-facing applications, the aspect of distributed deployment of IT resources (highly programmable and configurable bare metal servers, virtual machines, and containers) is being insisted. That is, the goal of the centralized management of distributed deployment of IT resources and applications has to be fulfilled. Such kinds of monitoring, measurement, and management is required for ensuring proactive, preemptive, and prompt failure anticipation and correction of all sorts of participating and contributing constituents. In other words, accomplishing the resiliency target is given much importance with the era of distributed computing. Policy establishment and enforcement is a proven way for bringing in a few specific automations. There are programming language-specific frameworks to add additional code and configuration into application code for implementing highly available and fault-tolerant applications.

It is therefore paramount to have a programming-agnostic resiliency and fault-tolerance framework in the microservices world. Service mesh is the appropriate way forward for creating and sustaining resilient microservices. Istio, an industry-strength open source framework, provides an easy way to create this service mesh. The following diagram conveys the difference between the traditional ESB tool-based and service-oriented application integration and the lightweight and elastic microservices-based application interactions:

A service mesh is a software solution for establishing a mesh out of all kinds of participating and contributing services. This mesh software enables the setting up and sustaining of inter-service communication. The service mesh is a kind of infrastructure solution. Consider the following:

  • A given microservice does not directly communicate with the other microservices.
  • Instead, all service-to-service communications take place on a service mesh software solution, which is a kind of sidecar proxy. Sidecar is a famous software integration pattern.
  • Service mesh provides the built-in support for some of the critical network functions such as microservice resiliency and discovery.

That is, the core and common network services are being identified, abstracted, and delivered through the service mesh solution. This enables service developers to focus on business capabilities alone. That is, business-specific features are with services, whereas all the horizontal (technical, network communication, security, enrichment, intermediation, routing, and filtering) services are being implemented in the service mesh software. For instance, today, the circuit-breaking pattern is being implemented and inscribed in the service code. Now, this pattern is being accomplished through a service mesh solution.

The service mesh software works across multiple languages. That is, services can be coded using any programming and script languages. Also, there are several text and binary data transmission protocols. Microservices, to talk to other microservices, have to interact with the service mesh for initiating service communication. This service-to-service mesh communication can happen over all the standard protocols, such as HTTP1.x/2.x, gRPC, and so on. We can write microservices using any technology, and they still work with the service mesh. The following diagram illustrates the contributions of the service mesh in making microservices resilient:

 

Finally, when resilient services get composed, we can produce reliable applications. Thus, the resiliency of all participating microservices leads to applications that are highly dependable.

Resilient microservices and reliable applications

Progressively, the world is connected and software-enabled. We often hear, read, and experience software-defined computing, storing, and networking capabilities. Physical, mechanical, electrical, and electronics systems in our everyday environments are being meticulously stuffed with software to be adroit, aware, adaptive, and articulate in their actions and reactions. Software is destined to play a strategic and significant role for producing and sustaining digitally impacted and transformed societies, one stand-out trait of new-generation software-enabled systems are responsive all the time through one or other ways. That is, they have to come out with a correct response. If the system is not responding, then another system has to respond correctly and quickly. That is, if a system is failing, an alternative system has to respond.

This is typically called system resiliency. If the system is extremely stressful due to heavy user and data loads, then additional systems have to be provisioned to respond to user's requests without any slowdown and breakdown. That is, auto-scaling is an important property for today's software systems to be right and relevant for businesses and users. This is generally called system elasticity. To make systems resilient and elastic, producing message-driven systems is the key decision. Message-driven systems are called reactive systems. Let's digress a bit here and explain the concepts behind system resiliency and elasticity.

A scalable application can scale automatically and accordingly to continuously function. Suddenly, there can be a greater number of users accessing the application. Still, the application has to continuously transact and can gracefully handle traffic peaks and dips. By adding and removing virtual machines and containers only when needed, scalable applications do their assigned tasks without any slowdown or breakdown. By dynamically provisioning additional resources, the utilization rate of scalable applications is optimal. Scalable applications support on-demand computing. There can be many users demanding the services of the application, or there can be more data getting pushed into the application. Containers and virtual machines are the primary resource and runtime environment for application components.