The Kubernetes Operator Framework Book

By : Michael Dame

1 (1)

Buy this Book

The Kubernetes Operator Framework Book

1 (1)

By: Michael Dame

Buy this Book

Overview of this book

From incomplete collections of knowledge and varying design approaches to technical knowledge barriers, Kubernetes users face various challenges when developing their own operators. Knowing how to write, deploy, and pack operators makes cluster management automation much easier – and that's what this book is here to teach you. Beginning with operators and Operator Framework fundamentals, the book delves into how the different components of Operator Framework (such as the Operator SDK, Operator Lifecycle Manager, and OperatorHub.io) are used to build operators. You’ll learn how to write a basic operator, interact with a Kubernetes cluster in code, and distribute that operator to users. As you advance, you’ll be able to develop a sample operator in the Go programming language using Operator SDK tools before running it locally with Operator Lifecycle Manager, and also learn how to package an operator bundle for distribution. The book covers best practices as well as sample applications and case studies based on real-world operators to help you implement the concepts you’ve learned. By the end of this Kubernetes book, you’ll be able to build and add application-specific operational logic to a Kubernetes cluster, making it easier to automate complex applications and augment the platform.

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Code in Action

Download the color images

Conventions used

Get in touch

Share Your Thoughts

Part 1: Essentials of Operators and the Operator Framework

Free Chapter

Chapter 1: Introducing the Operator Framework

Technical requirements

Managing clusters without Operators

Introducing the Operator Framework

Developing with the Operator SDK

Managing Operators with OLM

Distributing Operators on OperatorHub.io

Defining Operator functions with the Capability Model

Using Operators to manage applications

Summary

Chapter 2: Understanding How Operators Interact with Kubernetes

Interacting with Kubernetes cluster resources

Identifying users and maintainers

Designing beneficial features for your operator

Planning for changes in your Operator

Summary

Part 2: Designing and Developing an Operator

Chapter 3: Designing an Operator – CRD, API, and Target Reconciliation

Describing the problem

Designing an API and a CRD

Working with other required resources

Designing a target reconciliation loop

Handling upgrades and downgrades

Using failure reporting

Summary

Chapter 4: Developing an Operator with the Operator SDK

Technical requirements

Setting up your project

Defining an API

Adding resource manifests

Writing a control loop

Troubleshooting

Summary

Chapter 5: Developing an Operator – Advanced Functionality

Technical requirements

Understanding the need for advanced functionality

Reporting status conditions

Implementing metrics reporting

Implementing leader election

Adding health checks

Summary

Chapter 6: Building and Deploying Your Operator

Technical requirements

Building a container image

Deploying in a test cluster

Pushing and testing changes

Troubleshooting

Summary

Part 3: Deploying and Distributing Operators for Public Use

Chapter 7: Installing and Running Operators with the Operator Lifecycle Manager

Technical requirements

Understanding the OLM

Running your Operator

Working with OperatorHub

Troubleshooting

Summary

Chapter 8: Preparing for Ongoing Maintenance of Your Operator

Technical requirements

Releasing new versions of your Operator

Planning for deprecation and backward compatibility

Complying with Kubernetes standards for changes

Aligning with the Kubernetes release timeline

Working with the Kubernetes community

Summary

Chapter 9: Diving into FAQs and Future Trends

FAQs about the Operator Framework

FAQs about Operator design, CRDs, and APIs

FAQs about the Operator SDK and coding controller logic

FAQs about OperatorHub and the OLM

Future trends in the Operator Framework

Summary

Chapter 10: Case Study for Optional Operators – the Prometheus Operator

A real-world use case

Operator design

Operator distribution and development

Updates and maintenance

Summary

Chapter 11: Case Study for Core Operator – Etcd Operator

Core Operators – extending the Kubernetes platform

Summary

Other Books You May Enjoy

Packt is searching for authors like you

Share Your Thoughts

Customer Reviews

1 (1)

5 star

4 star

3 star

2 star

1 star

100%

Defining Operator functions with the Capability Model

The Operator Framework defines a Capability Model (https://operatorframework.io/operator-capabilities/) that categorizes Operators based on their functionality and design. This model helps to break down Operators based on their maturity, and also describes the extent of an Operator's interoperability with OLM and the capabilities users can expect when using the Operator.

The Capability Model is divided into five hierarchical levels. Operators can be published at any one of these levels and, as they grow, may evolve and graduate from one level to the next as features and functionality are added. In addition, the levels are cumulative, with each level generally encompassing all features of the levels below it.

The current level of an Operator is part of the CSV, and this level is displayed on its OperatorHub listing. The level is based on somewhat subjective yet guided criteria and is purely an informational metric.

Each level has specific functionalities that define it. These functionalities are broken down into Basic Install, Seamless Upgrades, Full Lifecycle, Deep Insights, and Auto Pilot. The specific levels of the Capability Model are outlined here:

Level I—Basic Install: This level represents the most basic of Operator capabilities. At Level I, an Operator is only capable of installing its Operand in the cluster and conveying the status of the workload to cluster administrators. This means that it can set up the basic resources required for an application and report when those resources are ready to be used by the cluster.

At Level I, an Operator also allows for simple configuration of the Operand. This configuration is specified through the Operator's Custom Resource. The Operator is responsible for reconciling the configuration specifications with the running Operand workload. However, it may not be able to react if the Operand reaches a failed state, whether due to malformed configuration or outside influence.

Going back to our example web application from the start of the chapter, a Level I Operator for this application would handle the basic setup of the workloads and nothing else. This is good for a simple application that needs to be quickly set up on many different clusters, or one that should be easily shared with users for them to install themselves.

Level II—Seamless Upgrades: Operators at Level II offer the features of basic installation, with added functionality around upgrades. This includes upgrades for the Operand but also upgrades for the Operator itself.

Upgrades are a critical part of any application. As bug fixes are implemented and more features are added, being able to smoothly transition between versions helps ensure application uptime. An Operator that handles its own upgrades can either upgrade its Operand when it upgrades itself or manually upgrade its Operand by modifying the Operator's Custom Resource.

For seamless upgrades, an Operator must also be able to upgrade older versions of its Operand (which may exist because they were managed by an older version of the Operator). This kind of backward compatibility is essential for both upgrading to newer versions and handling rollbacks (for example, if a new version introduces a high-visibility bug that can't wait for an eventual fix to be published in a patch version).

Our example web application Operator could offer the same set of features. This means that if a new version of the application were released, the Operator could handle upgrading the deployed instances of the application to the newer version. Or, if changes were made to the Operator itself, then it could manage its own upgrades (and later upgrade the application, regardless of version skew between Operator and Operand).

Level III—Full Lifecycle: Level III Operators offer at least one out of a list of Operand lifecycle management features. Being able to offer management during the Operand's lifecycle implies that the Operator is more than just passively operating on a workload in a set and forget fashion. At Level III, Operators are actively contributing to the ongoing function of the Operand.

The features relevant to the lifecycle management of an Operand include the following:

The ability to create and/or restore backups of the Operand.
Support for more complex configuration options and multistep workflows.
Failover and failback mechanisms for disaster recovery (DR). When the Operator encounters an error (either in itself or the Operand), it needs to be able to either re-route to a backup process (fail over) or roll the system back to its last known functioning state (fail back).
The ability to manage clustered Operands, and—specifically—support for adding and removing members to and from Operands. The Operator should be capable of considering quorum for Operands that run multiple replicas.
Similarly, support for scaling an Operand with worker instances that operate with read-only functionality.

Any Operator that implements one or more of these features can be considered to be at least a Level III Operator. The simple web application Operator could take advantage of a few of these, such as DR and scaling. As the user base grows and resources demands increase, an administrator could instruct the Operator to scale the application with additional replica Pods to handle the increased load.

Should any of the Pods fail during this process, the Operator would be smart enough to know to fail over to a different Pod or cluster zone entirely. Alternatively, if a new version of the web app was released that introduced an unexpected bug, the Operator could be aware of the previous successful version and provide ways to downgrade its Operand workloads if an administrator noticed the error.

Level IV—Deep Insights: While the previous levels focus primarily on Operator features as they relate to functional interaction with the application workload, Level IV emphasizes monitoring and metrics. This means an Operator is capable of providing measurable insights to the status of both itself and its Operand.

Insights may be seen as less important from a development perspective relative to features and bug fixes, but they are just as critical to an application's success. Quantifiable reports about an application's performance can drive ongoing development and highlight areas that need improvement. Having a measurable system to push these efforts allows a way to scientifically prove or disprove which changes have an effect.

Operators most commonly provide their insights in the form of metrics. These metrics are usually compatible with metrics aggregation servers such as Prometheus. (Interestingly enough, Red Hat publishes an Operator for Prometheus that is a Level IV Operator. That Operator is available on OperatorHub at https://operatorhub.io/operator/prometheus.)

However, Operators can provide insights through other means as well. These include alerts and Kubernetes Events. Events are built-in cluster resource objects that are used by core Kubernetes objects and controllers.

Another key insight that Level IV Operators report is the performance of the Operator and Operand. Together, these insights help inform administrators about the health of their clusters.

Our simple web application Operator could provide insights about the performance of the Operand. Requests to the app would provide information about the current and historic load on the cluster. Additionally, since the Operator can identify failed states at this point, it could trigger an alert when the application is unhealthy. Many alerts would indicate a reliability issue that would gain the attention of an administrator.

Level V—Auto Pilot: Level V is the most sophisticated level for Operators. It includes Operators that offer the highest capabilities, in addition to the features in all four previous levels. This level is called Auto Pilot because the features that define it focus on being able to run almost entirely autonomously. These capabilities include Auto Scaling, Auto-Healing, Auto-Tuning, and Abnormality Detection.

Auto Scaling is the ability for an Operator to detect the need to scale an application up or down based on demand. By measuring the current load and performance, an Operator can determine whether more or fewer resources are necessary to satisfy the current usage. Advanced Operators can even try to predict the need to scale based on current and past data.

Auto-Healing Operators can react to applications that are reporting unhealthy conditions and work to correct them (or, at least, prevent them from getting any worse). When an Operand is reporting an error, the Operator should take reactive steps to rectify the failure. In addition, Operators can use current metrics to proactively prevent an Operand from transitioning to a failure state.

Auto-Tuning means that an Operator can dynamically modify an Operand for peak performance. This involves tuning the settings of an Operand automatically. It can even include complex operations such as shifting workloads to entirely different nodes that are better suited than their current nodes.

Finally, Abnormality Detection is the capability of an Operator to identify suboptimal or off-pattern behavior in an Operand. By measuring performance, an Operator has a picture of the application's current and historical levels of functioning. That data can be compared to a manually defined minimum expectation or used to dynamically inform the Operator of that expectation.

All of these features are heavily dependent upon the use of metrics to automatically inform the Operator of the need to act upon itself or its Operand. Therefore, a Level V Operator is an inherent progression from Level IV, which is the level at which an Operator exposes advanced metrics.

At Level V, the simple web application Operator would manage most of the aspects of the application for us. It has insights into the current number of requests, so it can scale up copies of the app on demand. If this scaling starts to cause errors (for example, too many concurrent database calls), it can identify the number of failing Pods and prevent further scaling. It would also attempt to modify parameters of the web app (such as request timeouts) to help rectify the situation and allow the auto-scaling to proceed. When the load peak subsided, the Operator would then automatically scale down the application to its baseline service levels.

Levels I and II (Basic Install and Seamless Upgrades) can be used with the three facets of the Operator SDK: Helm, Ansible, and Go. However, Level III and above (Full Lifecycle, Deep Insights, and Auto Pilot) are only possible with Ansible and Go. This is because the functionality at these higher levels requires more intricate logic than what is available through Helm charts alone.

We have now explained the three main pillars of the Operator Framework: Operator SDK, OLM, and OperatorHub. We learned how each contributes different helpful features to the development and usage of Operators. We also learned about the Capability Model, which serves as a reference for the different levels of functionality that Operators can have. In the next section, we'll apply this knowledge to a sample application.

The Kubernetes Operator Framework Book

By : Michael Dame

The Kubernetes Operator Framework Book

By: Michael Dame

Overview of this book

Related Content you might be interested in

Current Title:

The Kubernetes Operator Framework Book

Kubernetes Design Patterns and Extensions

Managing Kubernetes Resources Using Helm

Go for DevOps

Defining Operator functions with the Capability Model