Mastering vRealize Operations Manager - Second Edition

By : Spas Kaloferov, Chris Slater, Scott Norris

Mastering vRealize Operations Manager - Second Edition

By: Spas Kaloferov, Chris Slater, Scott Norris

Overview of this book

In the modern IT world, the criticality of managing the health, efficiency, and compliance of virtualized environments is more important than ever. With vRealize Operations Manager 6.6, you can make a difference to your business by being reactive rather than proactive. Mastering vRealize Operations Manager helps you streamline your processes and customize the environment to suit your needs. You will gain visibility across all devices in the network and retain full control. With easy-to-follow, step-by-step instructions and support images, you will quickly master the ability to manipulate your data and display it in a way that best suits you and your business or technical requirements. This book not only covers designing, installing, and upgrading vRealize Operations 6.6, but also gives you a deep understanding of its building blocks: badges, alerts, super metrics, views, dashboards, management packs, and plugins. With the new vRealize Operations 6.6 troubleshooting capabilities, capacity planning, intelligent workload placement, and additional monitoring capabilities, this book is aimed at ensuring you get the knowledge to manage your virtualized environment as effectively as possible.

Preface

Who this book is for

What this book covers

To get the most out of this book

Get in touch

Free Chapter

Going Ahead with vRealize Operations

ROI with vRealize Operations

What can vRealize Operations do?

vRealize Operations key component architecture

vRealize Operations node types

Multi-node deployment, HA, and scalability

High Availability in vRealize Operations 6.6

Summary

Which vRealize Operations Deployment Model Fits Your Needs

Design considerations

Deployment examples

Seems too complex? Need help?

Summary

Initial Setup and Configuration

Meeting the requirements

Installation steps, formats, and types

Installation and upgrade

Summary

Extending vRealize Operations with Management Packs and Plugins

Collecting additional data

Defining a vRealize Operations solution

Overview of popular solutions

Installing solutions

Importing data with a REST API

Summary

Badges

What are vRealize Operations badges?

Understanding the Health badge

Understanding the Risk badge

Understanding the Efficiency badge

Summary

Getting a Handle on Alerting and Notifications

What are symptoms, recommendations, and actions?

Creating symptoms, recommendations, and alerts

What are policies?

Alert notifications

Summary

Capacity Management Made Easy

Resource capacity

Capacity management for vRealize Operations policies

Defining the correct capacity management policies for your environment

Projects

Custom Datacenter

Profiles

Summary

Aligning vRealize Operations with Business Outcomes

What is business-oriented reporting?

Tags, application groups, and custom groups

Putting it all together

Summary

Super Metrics Made Super Easy

What are super metrics and when do I use them?

What's new with super metrics?

Metric terminology and definitions

Super metric types

Building your own super metrics

Associating super metrics with objects

Using operators in super metrics

Comparing super metrics to views

Summary

Creating Custom Views

What's new in views and reports in vRealize Operations 6.6?

Views in vRealize Operations

Reports in vRealize Operations

Summary

Creating Custom Dashboards

About dashboards

Widgets

Creating custom dashboards

Summary

Using vRealize Operations to Monitor Applications

What is Endpoint Operations Management?

Managing the Endpoint Operations Management Agent

Viewing and collecting metrics

Other Endpoint Operations Management monitoring functionalities

Summary

Leveraging vRealize Operations for vSphere and vRealize Automation Workload Placement

What is Intelligent Workload Placement?

The Workload Balance dashboard

Rebalancing workloads with vRealize Operations and DRS

Automated rebalancing

vRealize Automation Workload Placement with vRealize Operations

Summary

Using vRealize Operations for Infrastructure Compliance

Integrated compliance

vSphere Hardening compliance

Monitoring compliance

PCI and HIPAA compliance

Summary

Troubleshooting vRealize Operations

Self-monitoring dashboards

Troubleshooting vRealize Operations components

Summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

Customer Reviews

5 star

4 star

3 star

2 star

1 star

vRealize Operations key component architecture

In vRealize Operations 6.0, a new platform design was introduced to meet some of the required goals that VMware envisaged for the product. These included the following:

The ability to treat all solutions equally, and to be able to offer management of performance, capacity, configuration, and compliance to both VMware and third-party solutions
Provide a single platform that can scale to tens of thousands of objects and millions of metrics by scaling out with little reconfiguration or redesign required
Support a monitoring solution that can be highly available, and support the loss of a node without impacting the ability to store or query information

With that new common platform, the design came a completely new architecture.

The following diagram shows the major components of the vRealize Operations 6.6 architecture:

The components of the vRealize Operations 6.6 architecture are as follows:

Watchdog
The user interface
Collector
GemFire
GemFire Locator
Controller
Analytics
Persistence

The Watchdog service

Watchdog is a vRealize Operations service that maintains the necessary daemons/services and attempts to restart them as necessary should there be a failure. The vcops-watchdog is a Python script that runs every five minutes by means of the cops-watchdog-daemon with the purpose of monitoring the various vRealize Operations services, including the Cluster and Slice Administrator (CaSA).

The Watchdog service performs the following checks:

PID file of the service
Service status

The user interface

In vRealize Operations 6.6, the UI is broken into two components the Product UI, and the Admin UI. The Product UI is present on all nodes, with the exception of nodes that are deployed as remote collectors.

The Admin UI is a web application hosted by Pivotal tc Server (Java application Apache web server), and is responsible for making HTTP REST calls to the admin API for node administration tasks. The CaSA is responsible for cluster administrative actions, such as the following:

Enabling/disabling the vRealize Operations cluster
Enabling/disabling cluster nodes
Performing software updates
Browsing log files

The Admin UI is purposely designed to be separate from the Product UI and to always be available for administration and troubleshooting-type tasks. A small database caches data from the Product UI that provides the last known state information to the Admin UI in the event that the Product UI and analytics are unavailable.

The Admin UI is available on each node at https://<NodeIP>/admin.

The Product UI is the main vRealize Operations graphical user interface. Like the Admin UI, the Product UI is based on Pivotal tc Server, and can make HTTP REST calls to the CaSA for administrative tasks; however, the primary purpose of the Product UI is to make GemFire calls to the Controller API to access data and create views, such as dashboards and reports.

The Apache2 HTTPD also provides the backend platform for another Tomcat instances. The Suite API is a public-facing API that can be used for automating/scripting common tasks. It is also used internally by vRealize Operations for carrying out numerous administrative tasks. The End Point Operations Management Adapter, HTTP Post Adapter, and Telemetry are also run by this Tomcat instance.

As shown in the following diagram, the Product UI is simply accessed via HTTPS on TCP 443. Apache then provides a reverse proxy back to the Product UI running in tc Server using the Apache APJ protocol:

The Collector

The Collector process is responsible for pulling in inventory and metric data from the configured sources. As shown in the following diagram, the Collector uses adapters to collect data from various sources, and then contacts the GemFire locator for connection information to one or more Controller cache servers. The Collector service then connects to one or more Controller API GemFire cache servers, and sends the collected data.

It is important to note that although an instance of an adapter can only be running on one node at a time, it does not imply that the collected data is being sent to the Controller on that node.

The Collector will send a heartbeat to the Controller every 30 seconds. This is sent via the HeartbeatThread thread process running on the Collector. It has a maximum of 25 data collection threads. Vice versa, the Controller node, vice versa, runs a HeartbeatServer thread process which processes heartbeats from Collectors. The CollectorStatusChecker thread process is a DistributedTask which uses data from HeartbeatServers to decides whether the Collector is up or down.

By default, the Collector will wait for 30 minutes for adapters to synchronize.

The Collector properties, including enabling or disabling the self-protection, can be configured from the collector.properties properties file located in /usr/lib/vmware-vcops/user/conf/collector.

The vSphere adapter provides the Collector process with the configuration information needed to pull in vCenter inventory and metric data. It consists of configuration files and a JAR file. A separate adapter instance is configured for each vCenter Server.

The Python adapter provides the Collector process with the configuration information needed to send remediation commands back to a vCenter Server (power on/off VM, vMotion VM, reconfigure VM, and so on).

The End Point Operations Management adapter is installed and listening by default on each vRealize Operations node. To receive data from operating systems, the agent must be installed and configured on each guest OS to be monitored.

The Horizon adapter provides the Collector process with the configuration information needed to pull in the Horizon View inventory and metric data. A separate adapter instance is configured for each Horizon View pod, and only one adapter instance is supported per vRealize Operations node with a limit of 10,000 Horizon objects:

The GemFire

VMware vFabric® GemFire® is an in-memory, low-latency data grid, running in the same JVM as the Controller and Analytics, that scales as needed when nodes are added to the cluster. It allows the caching, processing, and retrieval of metrics, and is functionally dependent on the GemFire Locator.

Remote collector nodes only communicate over GemFire using ports 10000-10010.

The GemFire locator

The vFabric GemFire locator runs on the master and master replica nodes. The data nodes and remote Collectors run GemFire as a client process.

The Controller

The Controller is a sub-process of the Analytics process, and is responsible for coordinating activity between the cluster members. It manages the storage and retrieval of the inventory of objects within the system. The queries are performed leveraging the GemFire MapReduce function that allows for selective querying. This allows for efficient data querying, as data queries are only performed on select nodes rather than all nodes.

The Controller will monitor the Collector status every minute. It also monitors how long a deleted resource is available in the inventory, and how long a non-existing resource is stored in the database.

The Collector properties can be configured from the controller.properties properties file located in /usr/lib/vmware-vcops/user/conf/controller.

Analytics

Analytics is the heart of vRealize Operations, as it is essentially the runtime layer for data analysis. The role of the Analytics process is to track the individual states of every metric, and then use various forms of correlation to determine if there are problems.

At a high level, the Analytics layer is responsible for the following tasks:

Metric calculations
Dynamic thresholds
Alerts and alarms
Metric storage and retrieval from the Persistence layer
Root cause analysis
Historic Inventory Server (HIS) version metadata calculations and relationship data

Analytics components work with the new GemFire-based cache, Controller, and Persistence layers. The Analytics process is also responsible for generating SMTP and SNMP alerts on the master and master-replica nodes.

Persistence

The Persistence (the database) layer, as its name implies, is the layer where the data is persisted to disk. The layer primarily consists of a series of databases performing different functions, and having different roles.

vRealize Operations uses two data storage solutions:

Postgres: This is a relational database that stores the configuration and state of the data
FSDB: This is a proprietary high-performance filesystem-based repository that stores all the time series data

Understanding the Persistence layer is an important aspect of vRealize Operations, as this layer has a strong relationship with the data and service availability of the solution.

vRealize Operations has five primary database services, as follows:

Common name	Role	DB type	Sharded	Location
Cassandra DB	User preferences and configuration, alerts definition, customizations, dashboards, policies, view, reports, licensing, shard maps, activities	Apache Cassandra	No	`/storage/db/vcops/cassandra`
Central ( Repl ) DB	Resource inventory	PostgreSQL	Yes	`/storage/db/vcops/vpostgres/data/`
Alerts /HIS (Data) DB	Alerts and alarm history, history of resource property data, history of resource relationship	PostgreSQL	Yes	`/storage/db/vcops/vpostgres/data`/
FSDB	Filesystem database containing the following: •RAW metrics •Super metrics data	FSDB	Yes	`/storage/db/vcops/data` `/storage/db/vcops/rollup`
CaSA DB	Cluster and Slice Administrator data	HSQL (Hyper SQL Database)	No	`/storage/db/casa/webapp/` `hsqlbd`

Prior to 6.3, a common administrator task was to re-index the database. The 6.3 release, and later releases, contain a scheduled task to re-index the database. The script itself is available at the following location: /usr/lib/vmware-vcops/user/conf/persistence/vpostgres/vpostgres_sharded_db_index_rebuild.sh.

Sharding is the term that GemFire uses to describe the process of distributing data across multiple systems to ensure that compute, storage, and network load is evenly distributed across the cluster.

Cassandra DB

The Cassandra database was introduced in 6.1 to replace the Global xDB database. Apache Cassandra is a highly scalable, high-performance, distributed database. It is designed to handle large amounts of structured data across many nodes. It provides HA with no single point of failure. Cassandra is highly scalable, as it allows us to add more vRealize Operations nodes in the future to the existing cluster.

Currently, the database stores the following:

User preferences and configuration
Alerts definition
Customizations
Dashboards, policies, views
Reports, licensing
Shard maps
Activities

Cassandra stores all the info that we see in the CONTENT folder; basically any settings that are applied globally.

Central (repl) DB

The Postgres database was introduced in 6.1. It has two instances in version 6.6. The Central Postgres DB, also called repl and the Alerts/HIS Postgres DB, also called data, are two separate database instances under the database called vcopsdb.

The Central DB exists only on the master and the master-replica node when HA is enabled. It is accessible via port 5433 and it is located in /storage/db/vcops/vpostgres/repl.

Currently, the database stores only resource inventory information.

Alerts /HIS (Data) DB

The Alerts DB is called data on all the data nodes including Master and Master-replica node. It was again introduced in 6.1. Starting from 6.2, the Historical Inventory Service xDB was merged with the Alerts DB. It is accessible via port 5432, and it is located in /storage/db/vcops/vpostgres/data.

Currently, the database stores the following:

Alerts and alarm history
History of resource property data
History of resource relationship

HSQL DB

The HSQL (or CaSA) database is a small, flat, JSON-based, in-memory DB that is used by CaSA for cluster administration.

FSDB

The FSDB contains all raw time series metrics and super metrics data for the discovered resources. It stores the data collected by adapters, and data that is calculated/generated (such as a system, badge, and metrics) based on the analysis of that data.

FSDB is a GemFire server, and runs inside the analytics JVM. It uses Sharding Manager to distribute data between nodes (new objects). We will discuss what vRealize Operations cluster nodes are later in this chapter. The FSDB is available in all the nodes of a vRealize Operations cluster deployment.

Mastering vRealize Operations Manager - Second Edition

By : Spas Kaloferov, Chris Slater, Scott Norris

Mastering vRealize Operations Manager - Second Edition

By: Spas Kaloferov, Chris Slater, Scott Norris

Overview of this book

Related Content you might be interested in

Current Title:

Mastering vRealize Operations Manager - Second Edition

VMware Cross-Cloud Architecture

vSphere High Performance Cookbook

Intelligent Automation with VMware