Defining the cloud native maturity model
There is no one right answer to what a cloud native architecture is; many types of architectures could fall into the cloud native category. Using the three design principles or axes—cloud native services, application centric design, and automation—most systems can be evaluated for their level of cloud native maturity. In addition, since these principles are ever expanding as new technologies, techniques or design patterns are developed, and so the maturity of cloud native architectures will continue to mature. We, the authors of this book, believe that cloud native architectures are formed by evolution and fall into a maturity model. For the remainder of this book, cloud native architectures will be described using the Cloud Native Maturity Model (CNMM), following the design principles outlined, so that architecture patterns can be mapped to their point of evolution:
Axis 1 – Cloud native services
To understand where a system will fall on the CNMM, it's important to understand what the components of cloud native architecture are. By definition, being cloud native requires the adoption of cloud services. Each cloud vendor will have its own set of services, with the most mature having the richest set of features. The incorporation of these services, from basic building blocks to the most advanced, cutting-edge technologies, will define how sophisticated a cloud native architecture is on the cloud services axis:
A mature cloud vendor's services
Amazon Web Services (AWS) is often cited as the most advanced cloud platform (at the time of writing). The following diagram shows all the services that AWS has to offer, from basic building blocks, to managed service offerings, to advanced platform services:
Cloud native services building blocks
Regardless of the level of maturity of the cloud vendor, they will have the building blocks of infrastructure, which include compute, storage, networking, and monitoring. Depending on the cloud maturity level of an organization and the team designing a system, it might be common to begin the cloud native journey by leveraging these baseline infrastructure building blocks. Virtual server instances, block disk storage, object storage, fiber lines and VPNs, load balancers, cloud API monitoring, and instance monitoring are all types of building blocks that a customer would use to start consuming the cloud. Similar to what would be available in an existing on-premises data center, these services would allow for a familiar look and feel for design teams to start creating applications in the cloud. The adoption of these services would be considered the bare minimum required to develop a cloud native architecture, and would result in a relatively low level on the cloud native services axis.
Often, a company will choose to migrate an existing application to the cloud and perform the migration in a lift-and-shift model. This approach would literally move the application stack and surrounding components to the cloud with no changes to the design, technology, or component architecture. Therefore, these migrations only use the basic building blocks that the cloud offers, since that is what is also in place at the customer-on-premises locations. While this is a low level of maturity, it allows for something critical to happen: gaining experience with how the cloud works. Even when using the cloud services building blocks, the design team will quickly add their own guard rails, policies, and naming conventions to learn more efficient techniques to deal with security, deployments, networking, and other core requirements for an early-stage cloud native system.
One of the key outcomes that a company will gain from this maturity stage is the basic premise of the cloud and how that impacts their design patterns: for example, horizontal scaling versus vertical scaling, and the price implications of these designs and how to implement them efficiently. In addition, learning how the chosen cloud vendor operates and groups their services in specific locations and the interactions between these groupings to design high availability and disaster recovery through architecture. Finally, learning the cloud approach to storage and the ability to offload processing to cloud services that scale efficiently and natively on the platform is a critical approach to designing architectures. Even though the adoption of cloud services building blocks is a relatively low level of maturity, it is critical for companies that are beginning their cloud journey.
Cloud vendor managed service offerings
Undifferentiated heavy lifting is often used to describe when time, effort, resources, or money are deployed to perform tasks that will not add to the bottom line of a company. Undifferentiated simply means that there is nothing to distinguish the activity from the way others do it. Heavy lifting refers to the difficult task of technology innovation and operations which, if done correctly, nobody ever recognizes, and if done wrong, can cause catastrophic consequences to business operations. These phrases combined mean that when a company does difficult tasks—that if done wrong will cause business impact, but that company doesn't have a core competency to distinguish itself in doing these tasks—it not only doesn't add business value, but can easily detract from the business.
Unfortunately, this describes a large majority of the IT operations for enterprise companies, and is a critical selling point to using the cloud. Cloud vendors do have a core competency in technology innovation and operations at a scale that most companies could never dream of. Therefore, it only makes sense that cloud vendors have matured their services to include managed offerings, where they own the management of all aspects of the service and the consumer only needs to develop the business logic or data being deployed to the service. This will allow the undifferentiated heavy lifting to be shifted from the company to the cloud vendor, and allow that company to dedicate significantly more resources to creating business value (their differentiators).
As we have seen, there are lots of combinations of cloud services that can be used to design cloud native architectures using only the basic building blocks and patterns; however, as a design team grows in their understanding of the chosen cloud vendor's services and becomes more mature in their approach, they will undoubtedly want to use more advanced cloud services. Mature cloud vendors will have managed service offerings that are often able to replace components that require undifferentiated heavy lifting. Managed service offerings from cloud vendors would include the following:
- Databases
- Hadoop
- Directory services
- Load balancers
- Caching systems
- Data warehouses
- Code repositories
- Automation tools
- Elastic searching tools
Another area of importance for these services is the agility they bring to a solution. If a system were designed to use these tools but managed by the company operations team, often the process to provision the virtual instance, configure, tune, and secure the package will significantly slow the progress being made by the design team. Using cloud vendor managed service offerings in place of those self-managed components will allow the teams to implement the architecture quickly, and begin the process of testing the applications that will run in that environment.
Using managed service offerings from a cloud vendor doesn't necessarily lead to more advanced architecture patterns; however, it does lead to the ability to think bigger and not be constrained by undifferentiated heavy lifting. This concept of not being constrained by limitations that are normally found on-premises, like finite physical resources, is a critical design attribute when creating cloud native architectures, which will enable systems to reach a scale hard to achieve elsewhere. For example, some areas where using a cloud vendor managed service offering would allow for a more scalable and native cloud architecture are as follows:
- Using managed load balancers to decouple components in an architecture
- Leveraging managed data warehouse systems to provision only the storage required and letting it scale automatically as more data is imported
- Using managed RDBMS database engines to enable quick and efficient transactional processing with durability and high availability built in
Advanced cloud native managed services
Cloud vendors are on their journey and continue to mature service offerings to ever more sophisticated levels. The current wave of innovation among the top cloud vendors suggests a continued move into more and more services that are managed by the vendor at scale and security, and reduces the cost, complexity, and risk to the customer. One way they are achieving this is by re-engineering existing technology platforms to not only deal with a specific technical problem, but by also doing it with the cloud value proposition in mind. For example, AWS has a managed database service, Amazon Aurora, built on a fully distributed and self-healing storage service deigned to keep data safe and distributed across multiple availability zones. This service increases the usefulness of the managed service offerings specific to databases, as described in the previous section, by also allowing for a storage array that grows on demand and is tuned for the cloud to provide performance of up to five times better than a similar database engine.
Not all advanced managed services are re-engineered ideas of existing technology; with the introduction of serverless computing, cloud vendors are removing undifferentiated heavy lifting away from not only operations, but also the development cycle. With the virtually limitless resources the cloud can offer, decoupling large applications into individual functions is the next big wave of distributed systems design, and leads directly into the formation of cloud native architectures.
According to AWS:
"Serverless computing allows you to build and run applications and services without thinking about servers. Serverless applications don't require you to provision, scale, and manage any servers. You can build them for virtually any type of application or backend service, and everything required to run and scale your application with high availability is handled for you."
There are many types of serverless services, including compute, API proxies, storage, databases, message processing, orchestration, analytics, and developer tools. One key attribute to define whether a cloud service is serverless or only a managed offering is the way that licensing and usage are priced. Serverless leaves behind the old-world model of core-based pricing, which would imply it is tied directly to a server instance, and relies more on consumption-based pricing. The length of time a function runs, the amount of transactions per second required, or a combination of these, are common metrics that are used for consumption-based pricing with serverless offerings.
Using these existing advanced cloud native managed services, and continuing to adopt the new ones as they are released, represents a high level of maturity on the CNMM, and will enable companies to develop some of the most advanced cloud native architectures. With these services and an experienced design team, a company will be able to push the boundaries of what is possible when old-world constraints are removed and truly limitless capacity and sophistication are leveraged. For example, instead of a traditional three-tier distributed computing stack consisting of a front end web, an application, or middleware tier and an OLTP database for storage, a new approach to this design pattern would be an API gateway that uses an event-driven computing container as an endpoint, and a managed and scalable NoSQL database for persistence. All these components could fall into a serverless model, therefore allowing the design team to focus on the business logic, and not how to achieve the scale required.
Beyond serverless and other advanced managed services lies the future. Currently, the cutting edge of cloud computing is the offerings being released in the artificial intelligence, machine learning, and deep learning space. Mature cloud vendors have services that fall into these categories with continued innovation happening on a hyperscale. It is still very early in the cycle for artificial intelligence, and design teams should expect more to come.
Cloud native services axis recap
The Cloud native services axis section described the components that could make up a cloud native architecture, and showed ever more mature approaches to creating applications using them. As with all of the design principles on the CNMM, the journey will begin with a baseline understanding of the principle and mature as the design team becomes more and more knowledgeable of how to implement at scale; however, cloud computing components are only one part of the design principles that are required to make up a mature cloud native architecture. These are used in conjunction with the other two principles, automation and application centricity, to create systems that can take advantage of the cloud in a secure and robust way.
Axis 2 – Application centric design
The second cloud native principle is about how the application itself will be designed and architected. This section will focus on the actual application design process, and will identify architecture patterns that are mature and lead to cloud native architectures. Similar to the other design principles of the CNMM, developing and architecting cloud native applications is an evolution with different patterns that are typically followed along the way. Ultimately, used in conjunction with the other CNMM principles, the outcome will be a mature, sophisticated, and robust cloud native architecture that is ready to continue evolving as the world of cloud computing expands:
Twelve-factor app design principles
The twelve-factor app is a methodology for building software-as-a-service applications (https://12factor.net/). This methodology was written in late 2011, and is often cited as the base building blocks for designing scalable and robust cloud native applications. Its principles apply to applications written in any programming language, and which use any combination of backing services (database, queue, memory cache, and so on), and is increasingly useful on any cloud vendor platform. The idea behind the twelve-factor app is that there are twelve important factors to consider when designing applications that minimize the time and cost of adding new developers, cleanly interoperate with the environment, can be deployed to cloud vendors, minimize divergence between environment, and allow for scaling of the application. The twelve factors (https://12factor.net/) are as follows:
Factor No. | Factors | Description |
1 | Code base | One code base tracked in revision control, many deploys. |
2 | Dependencies | Explicitly declare and isolate dependencies. |
3 | Config | Store config in the environment. |
4 | Backing services | Treat backing services as attached resources. |
5 | Build, release, run | Strictly separate build and run stages. |
6 | Processes | Execute the app as one (or more) stateless process(es). |
7 | Port binding | Export services through port binding. |
8 | Concurrency | Scale-out through the process model. |
9 | Disposability | Maximize robustness with fast startup and graceful shutdown. |
10 | Dev/prod parity | Keep development, staging, and production as similar as possible. |
11 | Logs | Treat logs as event streams. |
12 | Admin processes | Run admin/management tasks as one-off processes. |
Previous sections of this chapter have already discussed how the CNMM takes into consideration multiple factors from this methodology. For example, factor 1 is all about keeping the code base in a repository, which is standard best practice. Factors 3, 10, and 12 are all about keeping your environments separate, but making sure they do not drift apart from each other from a code and configuration perspective. Factor 5 ensures that you have a clean and repeatable CICD pipeline with a separation of functions. And factor 11 is about treating logs as event streams so they can be analyzed and acted upon in near real time. The remaining factors align well to cloud native design for the simple reason that they focus on being self-contained (factor 2), treat everything as a service (factors 4 and 7), allow efficient scale-out (factors 6 and 8), and handle faults gracefully (factor 9). Designing an application using the 12 factor methodology is not the only way to develop cloud native architectures; however, it does offer a standardized set of guidelines that, if followed, will allow an application to be more mature on the CNMM.
Monolithic, SOA, and microservices architectures
Architecture design patterns are always evolving to take advantage of the newest technology innovations. For a long time, monolithic architectures were popular, often due to the cost of physical resources and the slow velocity in which applications were developed and deployed. These patterns fit well with the workhorse of computing and mainframes, and even today there are plenty of legacy applications running as a monolithic architecture. As IT operations and business requirements became more complex, and speed to market was gaining importance, additional monolithic applications were deployed to support these requirements. Eventually, these monolithic applications needed to communicate with each other to share data or execute functions that the other systems contained. This intercommunication was the precursor to service-oriented architectures (SOA), which allowed design teams to create smaller application components (as compared to monolithic), implement middleware components to mediate the communication, and isolate the ability to access the components, except through specific endpoints. SOA designs increasingly gained popularity during the virtualization boom, since deploying services became easier and less expensive on virtualized hardware.
Service-oriented architectures consist of two or more components which provide their services to other services through specific communication protocols. The communication protocols are often referred to as web services, and consist of a few different common ones: WSDL, SOAP, and RESTful HTTP, in addition to messaging protocols like JMS. As the complexity of these different protocols and services grew, using an enterprise service bus (ESB) became increasingly common as a mediation layer between services. This allowed for services to abstract their endpoints, and the ESB could take care of the message translations from various sources to get a correctly formatted call to the desired system. While this ESB approach reduced the complexity of communicating between services, it also introduced new complexity in the middleware logic required to translate service calls and handle workflows. This often resulted in very complex SOA applications where application code for each of the components needed to be deployed at the same time, resulting in a big bang and risky major deployment across the composite application. The SOA approach had a positive impact on the blast radius issues that monolithic architectures inherently had by separating core components into their own discrete applications; however, it also introduced a new challenge in the complexity of deployment. This complexity manifested itself in a way that caused so many interdependencies that a single large deployment across all SOA applications was often required. As a result of these risky big bang deployments, they were often only undertaken a few times a year, and drastically reduced velocity slowed the pace of the business requirements. As cloud computing became more common and the constraints of the on-premises environments began to fade a way, a new architecture pattern evolved: microservices. With the cloud, application teams no longer needed to wait months to have compute capacity to test their code, nor were they constrained by a limited number of physical resources, either.
The microservices architecture style takes the distributed nature of SOA and breaks those services up into even more discrete and loosely coupled application functions. Microservices not only reduce blast radius by even further isolating functions, but they also dramatically increase the velocity of application deployments by treating each microservice function as its own component. Using a small DevOps team accountable for a specific microservice will allow for the continuous integration and continuous delivery of the code in small chunks, which increases velocity and also allow for quick rollbacks in the event of unintended issues being introduced to the service.
Microservices and cloud computing fit well together, and often microservices are considered the most mature type of cloud native architecture at this point in time. The reason why they fit so well together is due to the way cloud vendors develop their services, often as individual building blocks that can be used in many ways to achieve a business result. This building block approach gives the application design teams the creativity to mix and match services to solve their problems, rather then being forced into using a specific type of data store or programming language. This has led to increased innovation and design patterns that take advantage of the cloud, like serverless computing services to further obfuscate the management of resources from the development teams, allowing them to focus on business logic.
Cloud native design considerations
Regardless of the methodology used or the final cloud native design pattern implemented, there are specific design considerations that all cloud native architectures should attempt to implement. While not all of these are required to be considered a cloud native architecture, as these considerations are implemented, the maturity of the system increases, and will fall on a higher level of the CNMM. These considerations include instrumentation, security, parallelization, resiliency, event-driven, and future-proofed:
- Instrumentation: Including application instrumentation is about more than just log stream analysis; it requires the ability to monitor and measure the performance of the application in real time. Adding instrumentation will directly lead to the ability of the application to be self-aware of latency conditions, component failures due to system faults, and other characteristics that are important to a specific business application. Instrumentation is critical to many of the other design considerations, so including it as a first-class citizen in the application will enable long-term benefits.
- Security: All applications need security built in; however, designing for a cloud native security architecture is critical to ensure the application takes advantage of cloud vendor security services, third-party security services, and design-level security in layers, all of which will harden the posture of the application and reduce the blast radius in the event of an attack or breach.
- Parallelization: Designing an application that can execute distinct processes in parallel with other parts of the application will directly impact its ability to have the performance required as it scales up. This includes allowing the same set of functions to execute many times in parallel, or having many distinct functions in the application execute in parallel.
- Resiliency: Considering how the application will handle faults and still perform at scale is important. Using cloud vendor innovations, like deployment across multiple physical data centers, using multiple decoupled tiers of the application, and automating the startup, shutdown, and migration of application components between cloud vendor locations are all ways to ensure resiliency for the application.
- Event-driven: Applications that are event-driven are able to employ techniques that analyze the events to perform actions, whether those be business logic, resiliency modification, security assessments, or auto scaling of application components. All events are logged and analyzed by advanced machine learning techniques to enable additional automation to be employed as more events are identified.
- Future-proofed: Thinking about the future is a critical way to ensure that an application will continue to evolve along the CNMM as time and innovation moves on. Implementing these considerations will help with future-proofing; however, all applications must be optimized through automation and code enhancements constantly to always be able to deliver the business results required.
Application centric design axis recap
There are many different methodologies that can be employed to create a cloud native application, including microservices, twelve-factor app design patterns, and cloud native design considerations. There is no one correct path for designing cloud native applications, and as with all parts of the CNMM, maturity will increase as more robust considerations are applied. The application will reach the peak maturity for this axis once most of these designs are implemented.
Axis 3 – Automation
The third and final cloud native principle is about automation. Throughout this chapter, the other CNMM principles have been discussed in detail and explained, particularly why using cloud native services and application centric design enable cloud native architectures to achieve scale. However, these alone do not allow a system to really take advantage of the cloud. If a system were designed using the most advanced services available, but the operational aspects of the application were done manually, it would have a hard time realizing its intended purpose. This type of operational automation is often referred to as Infrastructure as Code, and there is an evolution in maturity to achieve a sophisticated cloud native architecture. Cloud vendors typically develop all of their services to be API endpoints, which allows for programmatic calls to create, modify, or destroy services. This approach is the driver behind Infrastructure as Code, where previously an operations team would be responsible for the physical setup and deployment of components in addition to the infrastructure design and configuration.
With Infrastructure as Code automation, operations teams can now focus on the application-specific design and rely on the cloud vendor to handle the undifferentiated heavy lifting of resource deployment. This Infrastructure as Code is then treated like any other deployment artifact for the application, and is stored in source code repositories, versioned and maintained for long-term consistency of environment buildouts. The degree of automation still evolves on a spectrum with the early phases being focused on environment buildout, resource configuration, and application deployments. As a solution matures, the automation will evolve to include more advanced monitoring, scaling, and performance activities, and ultimately include auditing, compliance, governance, and optimization of the full solution. From there, automation of the most advanced designs use artificial intelligence and machine and deep learning techniques to self-heal and self-direct the system to change its structure based on the current conditions.
Automation is the key to achieving the scale and security required by cloud native architectures:
Environment management, configuration, and deployment
Designing, deploying, and managing an application in the cloud is complicated, but all systems need to be set up and configured. In the cloud, this process can become more streamlined and consistent by developing the environment and configuration process with code. There are potentially lots of cloud services and resources involved that go well beyond traditional servers, subnets, and physical equipment being managed on-premises. This phase of the automation axis focuses on API-driven environment provisioning, system configuration, and application deployments, which allows customers to use code to handle these repeatable tasks.
Whether the solution is a large and complex implementation for an enterprise company or a relatively straightforward system deployment, the use of automation to manage consistency is critical to enabling a cloud native architecture. For large and complex solutions, or where regulatory requirements demand the separation of duties, companies can use Infrastructure as Code to isolate the different operations teams to focus only on their area—for example, core infrastructure, networking, security, and monitoring. In other cases, all components could be handled by a single team, possibly even the development team if a DevOps model is being used. Regardless of how the development of the Infrastructure as Code happens, it is important to ensure that agility and consistency are a constant in the process, allowing systems to be deployed often, and following the design requirements exactly.
There are multiple schools of thought on how to handle the operations of a system using Infrastructure as Code. In some cases, every time an environment change occurs, the full suite of Infrastructure as Code automation is executed to replace the existing environment. This is referred to as immutable infrastructure, since the system components are never updated, but replaced with the new version or configuration every time. This allows a company to reduce environment or configuration drift, and also ensures a strict way to prove governance and reduce security issues that could be manually implemented.
While the immutable infrastructure approach has its advantages, it might not be feasible to replace the full environment every time, and so changes must be made at a more specific component level. Automation is still critical with this approach to ensure everything is implemented with consistency; however, it would make the cloud resources mutable, or give them the ability to change over time. There are numerous vendors that have products to achieve instance-level automation, and most cloud vendors have managed offerings to perform this type of automation. These tools will allow for code or scripts to be run in the environment to make the changes. These scripts would be a part of the Infrastructure as Code deployment artifacts, and would be developed and maintained in the same way as the set of immutable scripts.
Environment management and configuration is not the only way automation is required at this baseline level. Code deployments and elasticity are also very important components to ensuring a fully automated cloud native architecture. There are numerous tools on the market that allow for the full deployment pipeline being automated, often referred to as continuous integration, continuous deployment (CICD). A code deployment pipeline often includes all aspects of the process, from code check-in, automated compiling with code analysis, packaging, and deployment, to specific environments with different hooks or approval stops to ensure a clean deployment. Used in conjunction with Infrastructure as Code for environment and operations management, CICD pipelines allow for extreme agility and consistency for a cloud native architecture.
Monitoring, compliance, and optimization through automation
Cloud native architectures that use complex services and span across multiple geographic locations require the ability to change often based on usage patterns and other factors. Using automation to monitor the full span of the solution, ensure compliance with company or regulatory standards, and continuously optimize resource usage shows an increasing level of maturity. As with any evolution, building on the previous stages of maturity enables the use of advanced techniques that allow for increased scale to be introduced to a solution.
One of the most important data points that can be collected is the monitoring data that cloud vendors will have built into their offerings. Mature cloud vendors have monitoring services natively integrated to their other services that can capture metrics, events, and logs of those services that would otherwise be unobtainable. Using these monitoring services to trigger basic events that are happening is a type of automation that will ensure overall system health. For example, a system using a fleet of compute virtual machines as the logic tier of a services component normally expects a certain amount of requests, but at periodic times a spike in requests causes the CPU and network traffic on these instances to quickly increase. If properly configured, the cloud monitoring service will detect this increase and launch additional instances to even out the load to more acceptable levels and ensure proper system performance. The process of launching additional resources is a design configuration of the system that requires Infrastructure as Code automation, to ensure that the new instances are deployed using the same exact configuration and code as all the others in the cluster. This type of activity is often called auto scaling, and it also works in reverse, removing instances once the spike in requests has subsided.
Automated compliance of environment and system configurations is becoming increasingly critical for large enterprise customers. Incorporating automation to perform constant compliance audit checks across system components shows a high level of maturity on the automation axis. These configuration snapshot services allow a complete picture in time of the full environmental makeup, and are stored as text for long-term analysis. Using automation, these snapshots can be compared against previous views of the environment to ensure that configuration drift has not happened. In addition to previous views of the environment, the current snapshot can be compared against desired and compliant configurations that will support audit requirements in regulated industries.
Optimization of cloud resources is an area that can easily be overlooked. Before the cloud, a system was designed and an estimation was used to determine the capacity required for that system to run at peak requirements. This led to the procurement of expensive and complex hardware and software before the system had even been created. Due to that, it was common for a significant amount of over-provisioned capacity to sit idle and waiting for an increase in requests to happen. With the cloud, those challenges all but disappear; however, system designers still run into situations where they don't know what capacity is needed. To resolve this, automated optimization can be used to constantly check all system components and, using historical trends, understand if those resources are over-or under-utilized. Auto scaling is a way to achieve this; however, there are much more sophisticated ways that will provide additional optimization if implemented correctly; for example, using an automated process to check running instances across all environments for under-used capacity and turning them off, or performing a similar check to shut down all development environments on nights and weekends, could save lots of money for a company.
One of the key ways to achieve maturity for a cloud native architecture with regards to monitoring, compliance, and optimization is to leverage an extensive logging framework. Loading data into this framework and analyzing that data to make decisions is a complex task, and requires the design team to fully understand the various components and make sure that all required data is being captured at all times. These types of frameworks help to remove the thinking that logs are files to be gathered and stored, and instead focus on logs as streams of events that should be analyzed in real time for anomalies of any kind. For example, a relatively quick way to implement a logging framework would be to use ElastiCache, Logstash, and Kibana, often referred to as an ELK stack, to capture all types of the system log events, cloud vendor services log events, and other third-party log events.
Predictive analytics, artificial intelligence, machine learning, and beyond
As a system evolves and moves further up the automation maturity model, it will rely more and more on the data it generates to analyze and act upon. Similar to the monitoring, compliance, and optimization design from the previous part of the axis, a mature cloud native architecture will constantly be analyzing log event streams to detect anomalies and inefficiencies; however, the most advanced maturity is demonstrated by using artificial intelligence (AI) and machine learning (ML) to predict how events could impact the system and make proactive adjustments before they cause performance, security, or other business degradation. The longer the event data collected is stored and the amount of disparate sources the data comes from will allow these techniques to have ever-increasing data points to take action upon.
Using the automation building blocks already discussed from this axis in combination with the AI and ML, the system has many options to deal with a potential business impacting event.
Data is king when it comes to predictive analytics and machine learning. The never-ending process of teaching a system how to categorize events takes time, data, and automation. Being able to correlate seemingly unrelated data events to each other to form a hypothesis is the basis of AI and ML techniques. These hypotheses will have a set of actions that can be taken if they occur, which, in the past, has resulted in anomaly correction. Automated responses to an event that matches an anomaly hypothesis and taking corrective action is an example of using predictive analytics based on ML to resolve an issue before it becomes business-impacting. In addition, there will always be situations where a new event is captured and historical data cannot accurately correlate that to a previously known anomaly. Even still, this lack of correlation is actually an indicator in itself and will enable the cross-connection of data events, anomalies, and responses to gain more intelligence.
There are many examples of how using ML on datasets will show correlation that could not be seen by a human reviewing the same datasets—like how often a failed user login resulted in a lockout versus a retry over millions of different attempts, and if those lockouts were the result of a harmless user forgetting a password, or a brute-force attack to gain system entry. Because the algorithm can search all required datasets and correlate the results, it will be able to identify patterns of when an event is harmless or malicious. Using the output from these patterns, predictive actions can be taken to prevent potential security issues by quickly isolating frontend resources or blocking requests from users deemed to be malicious due to where they come from (IP or country specific), the type of traffic being transmitted (Distributed Denial of Service), or another scenario.
This type of automation, if implemented correctly across a system, will result in some of the most advanced architectures that are possible today. With the current state of the cloud services available, using predictive analytics, artificial intelligence, and machine learning is the cutting edge of how a mature cloud native architecture can be designed; however, as the services become more mature, additional techniques will be available and innovative people will continue to use these in ever-increasing maturity to ensure their systems are resilient to business damage.
Automation axis recap
Automation unlocks significant value when implemented for a cloud native architecture. The maturity level of automation will evolve from simply setting up environments and configuring components to performing advanced monitoring, compliance, and optimization across the solution. Combined with increased innovation of cloud vendor services, the maturity of automation and using artificial intelligence and machine learning will allow for predictive actions to be taken to resolve common, known, and increasingly unknown anomalies in a system. This combination of cloud vendor services adoption and automation form two of the three critical design principles for the CNMM, with the application design and architecture principle being the final requirement.