Mastering vRealize Operations Manager - Second Edition

By : Spas Kaloferov, Chris Slater, Scott Norris

Mastering vRealize Operations Manager - Second Edition

By: Spas Kaloferov, Chris Slater, Scott Norris

Overview of this book

In the modern IT world, the criticality of managing the health, efficiency, and compliance of virtualized environments is more important than ever. With vRealize Operations Manager 6.6, you can make a difference to your business by being reactive rather than proactive. Mastering vRealize Operations Manager helps you streamline your processes and customize the environment to suit your needs. You will gain visibility across all devices in the network and retain full control. With easy-to-follow, step-by-step instructions and support images, you will quickly master the ability to manipulate your data and display it in a way that best suits you and your business or technical requirements. This book not only covers designing, installing, and upgrading vRealize Operations 6.6, but also gives you a deep understanding of its building blocks: badges, alerts, super metrics, views, dashboards, management packs, and plugins. With the new vRealize Operations 6.6 troubleshooting capabilities, capacity planning, intelligent workload placement, and additional monitoring capabilities, this book is aimed at ensuring you get the knowledge to manage your virtualized environment as effectively as possible.

Preface

Who this book is for

What this book covers

To get the most out of this book

Get in touch

Free Chapter

Going Ahead with vRealize Operations

ROI with vRealize Operations

What can vRealize Operations do?

vRealize Operations key component architecture

vRealize Operations node types

Multi-node deployment, HA, and scalability

High Availability in vRealize Operations 6.6

Summary

Which vRealize Operations Deployment Model Fits Your Needs

Design considerations

Deployment examples

Seems too complex? Need help?

Summary

Initial Setup and Configuration

Meeting the requirements

Installation steps, formats, and types

Installation and upgrade

Summary

Extending vRealize Operations with Management Packs and Plugins

Collecting additional data

Defining a vRealize Operations solution

Overview of popular solutions

Installing solutions

Importing data with a REST API

Summary

Badges

What are vRealize Operations badges?

Understanding the Health badge

Understanding the Risk badge

Understanding the Efficiency badge

Summary

Getting a Handle on Alerting and Notifications

What are symptoms, recommendations, and actions?

Creating symptoms, recommendations, and alerts

What are policies?

Alert notifications

Summary

Capacity Management Made Easy

Resource capacity

Capacity management for vRealize Operations policies

Defining the correct capacity management policies for your environment

Projects

Custom Datacenter

Profiles

Summary

Aligning vRealize Operations with Business Outcomes

What is business-oriented reporting?

Tags, application groups, and custom groups

Putting it all together

Summary

Super Metrics Made Super Easy

What are super metrics and when do I use them?

What's new with super metrics?

Metric terminology and definitions

Super metric types

Building your own super metrics

Associating super metrics with objects

Using operators in super metrics

Comparing super metrics to views

Summary

Creating Custom Views

What's new in views and reports in vRealize Operations 6.6?

Views in vRealize Operations

Reports in vRealize Operations

Summary

Creating Custom Dashboards

About dashboards

Widgets

Creating custom dashboards

Summary

Using vRealize Operations to Monitor Applications

What is Endpoint Operations Management?

Managing the Endpoint Operations Management Agent

Viewing and collecting metrics

Other Endpoint Operations Management monitoring functionalities

Summary

Leveraging vRealize Operations for vSphere and vRealize Automation Workload Placement

What is Intelligent Workload Placement?

The Workload Balance dashboard

Rebalancing workloads with vRealize Operations and DRS

Automated rebalancing

vRealize Automation Workload Placement with vRealize Operations

Summary

Using vRealize Operations for Infrastructure Compliance

Integrated compliance

vSphere Hardening compliance

Monitoring compliance

PCI and HIPAA compliance

Summary

Troubleshooting vRealize Operations

Self-monitoring dashboards

Troubleshooting vRealize Operations components

Summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Multi-node deployment, HA, and scalability

So far, we have focused on the new architecture and components of vRealize Operations 6.6, as well as starting to mention the major architectural changes that the GemFire-based Controller, Analytics, and Persistence layers have introduced. Now, before we close this chapter, we will dive down a little deeper into how data is handled in multi-node deployment, and, finally, how HA works in vRealize Operations 6.6, and what design decisions revolve around a successful deployment.

We are also going to mention what scalability considerations you should make to configure your initial deployment of vRealize Operations based on anticipated usage.

GemFire clustering

At the core of vRealize Operations, 6.6 architecture is the powerful GemFire in-memory clustering and distributed cache. GemFire provides the internal transport bus, as well as the ability to balance CPU and memory consumption across all nodes through compute pooling, memory sharing, and data partitioning. With this change, it is better to then think of the Controller, Analytics, and Persistence layers as components that span nodes, rather than individual components on individual nodes:

During deployment, ensure all your vRealize Operations 6.6 nodes are configured with the same amount of vCPUs and memory. This is because, from a load balancing point of view, vRealize Operations expects all nodes to have the same amount of resources as part of the controller's round-robin load balancing.

The migration to GemFire is probably the single largest underlying architectural change from vCenter Operations Manager 5.x, and the result of moving to a distributed in-memory database has made many of the new vRealize Operations 6.x features possible, including the following:

Elasticity and scale: Nodes can be added on demand, allowing vRealize Operations to scale as required. This allows a single Operations Manager instance to scale to 6 extra large nodes in a cluster, which can support up to 180,000 objects and 45,000,000 metrics.
Reliability: When GemFire HA is enabled, a backup copy of all data is stored in both the Analytics GemFire cache and the Persistence layer.
Availability: Even with the GemFire HA mode disabled, in the event of a failure, other nodes take over the failed services and the load of the failed node (assuming the failure was not the master node).
Data partitioning: vRealize Operations leverages GemFire data partitioning to distribute data across nodes in units called buckets. A partition region will contain multiple buckets that are configured during a startup, or migrated during a rebalance operation. Data partitioning allows the use of the GemFire MapReduce function. This function is a data-aware query, that supports parallel data querying on a subset of the nodes. The result of this is then returned to the coordinator node for final processing.

GemFire sharding

When describing the Persistence layer earlier, we listed the new components related to Persistence in vRealize Operations 6.6, Now it's time to discuss what sharding actually is.

GemFire sharding is the process of splitting data across multiple GemFire nodes for placement in various partitioned buckets. It is this concept in conjunction with the controller and locator services that balance the incoming resources and metrics across multiple nodes in the vRealize Operations Cluster. It is important to note that data is sharded per resource, and not per adapter instance. For example, this allows the load balancing of incoming and outgoing data, even if only one adapter instance is configured. From a design perspective, a single vRealize Operations cluster could then manage a maximum configuration vCenter by distributing the incoming metrics across multiple data nodes.

In vRealize Operations 6.6, the maximum number of VMware vCenter adapter instances certified is 60, and the maximum number of VMware vCenter adapter instances that were tested on a single collector is 40.

vRealize Operations data is sharded in both the Analytics and Persistence layers, which is referred to as GemFire cache sharding and GemFire Persistence sharding respectively.

Just because data is held in the GemFire cache on one node, this does not necessarily result in the data shard persisting on the same node. In fact, as both layers are balanced independently, the chance of both the cache shard and Persistence shard existing on the same node is 1/N, where N is the number of nodes.

In an HA environment, the databases that use GemFire sharding are Central, Alert/HIS, and FSDB. The Cassandra DB uses its own clustering mechanism.

Adding, removing, and balancing nodes

One of the biggest advantages of a GemFire-based cluster is the elasticity of adding nodes to the cluster as the number of resources and metrics grows in your environment. This allows administrators to add or remove nodes if the size of their environment changes unexpectedly; for example, a merger with another IT department, or catering for seasonal workloads that only exist for a small period of the year.

From a deployment perspective, we want to hide the complexities of scaling out from the user, so we deploy the whole stack at a time. When one instance/slice of the stack runs out of capacity (CPU/disk/memory), we can spin up another, and add more capacity. We can keep doing this as necessary to handle the scale.

Although adding nodes to an existing cluster is something that can be done at any time, there is a slight cost when doing so. As just mentioned, it is important when adding new nodes that they are sized the same as the existing cluster nodes; this will ensure during a rebalance operation that the load is distributed equally between the cluster nodes:

When adding new nodes to the cluster sometime after initial deployment, it is recommended that the Rebalance Disk option be selected under Cluster Management. As seen in the preceding figure, the warning advises that this is a very disruptive operation that may take hours and, as such, it is recommended that this be a planned maintenance activity. The amount of time this operation will take will vary depending on the size of the existing cluster and the amount of data in the FSDB. As you can probably imagine, if you are adding the eighth node to an existing seven-node cluster with tens of thousands of resources, there could potentially be several TBs of data that need to be re-sharded over the entire cluster. It is also strongly recommended that when adding new nodes the disk capacity and performance match that of existing nodes, as the Rebalance Disk operation assumes this is the case.

This activity is not required to start receiving the compute and network load balancing benefits of the new node. This can be achieved by selecting the Rebalance GemFire option, which is a far less disruptive process. As per the description, this process re-partitions the JVM buckets, balancing the memory across all active nodes in the GemFire federation. With the GemFire cache balanced across all nodes, the compute and network demand should be roughly equal across all the nodes in the cluster.

Although this allows early benefit from adding a new node into an existing cluster, unless a large number of new resources is discovered by the system shortly afterward, the majority of disk I/O for persisted, sharded data will occur on other nodes.

Apart from adding nodes, vRealize Operations also allows the removal of a node at any time, as long as it has been taken offline first. This can be valuable if a cluster was originally oversized for a requirement, and is considered a waste of physical compute resource; however, this task should not be taken lightly, as the removal of a data node without HA enabled will result in the loss of all metrics on that node. As such, it is recommended that removing nodes from the cluster is generally avoided.

If the permanent removal of a data node is necessary, ensure HA is first enabled to prevent data loss.

Mastering vRealize Operations Manager - Second Edition

By : Spas Kaloferov, Chris Slater, Scott Norris

Mastering vRealize Operations Manager - Second Edition

By: Spas Kaloferov, Chris Slater, Scott Norris

Overview of this book

Related Content you might be interested in

Current Title:

Mastering vRealize Operations Manager - Second Edition

VMware Cross-Cloud Architecture

vSphere High Performance Cookbook

Intelligent Automation with VMware