Book Image

vSphere High Performance Cookbook - Second Edition - Second Edition

By : Kevin Elder, Christopher Kusek, Prasenjit Sarkar
Book Image

vSphere High Performance Cookbook - Second Edition - Second Edition

By: Kevin Elder, Christopher Kusek, Prasenjit Sarkar

Overview of this book

vSphere is a mission-critical piece of software for many businesses. It is a complex tool, and incorrect design and deployment can create performance related issues that can negatively affect the business. This book is focused on solving these problems as well as providing best practices and performance-enhancing techniques. This edition is fully updated to include all the new features in version 6.5 as well as the latest tools and techniques to keep vSphere performing at its best. This book starts with interesting recipes, such as the interaction of vSphere 6.5 components with physical layers such as CPU, memory, and networking. Then we focus on DRS, resource control design, and vSphere cluster design. Next, you’ll learn about storage performance design and how it works with VMware vSphere 6.5. Moving on, you will learn about the two types of vCenter installation and the benefits of each. Lastly, the book covers performance tools that help you get the most out of your vSphere installation. By the end of this book, you will be able to identify, diagnose, and troubleshoot operational faults and critical performance issues in vSphere 6.5.
Table of Contents (17 chapters)
Title Page
Credits
About the Authors
About the Reviewer
www.PacktPub.com
Customer Feedback
Preface

Spotting CPU overcommitment


When we provision CPU resources, which is the number of vCPUs allocated to run the VMs, and if its number is greater than the number of physical cores on a host, it is called CPU overcommitment. CPU overcommitment is a normal practice in many situations as it increases the consolidation ratio. Nevertheless, you need to monitor it closely.

CPU overcommitment is not recommended in order to satisfy or guarantee the workload of a tier-1 application with a tight SLA. It may be successfully leveraged to highly consolidate and reduce the power consumption of light workloads on modern, multicore systems.

Getting ready

To step through this recipe, you need a running ESXi Server with SSH enabled, a couple of running CPU-hungry VMs, an SSH client (Putty), a vCenter Server, and vSphere Web Client. No other prerequisites are required.

The following table elaborates on Esxtop CPU Performance Metrics:

Esxtop Metric

Description

Implication

%RDY

The percentage of time a vCPU in a run queue is waiting for the CPU scheduler to let it run on a physical CPU.

A high %RDY time (use 20 percent as the starting point) may indicate the VM is under resource contention. Monitor this; if the application speed is OK, a higher threshold may be tolerated.

%USED

The percentage of possible CPU processing cycles that were actually used for work during this time interval.

The %USED value alone does not necessarily indicate that the CPUs are overcommitted. However, high %RDY values plus high %USED values are a sure indicator that your CPU resources are overcommitted.

How to do it...

To spot CPU overcommitment, there are a few CPU resource parameters that you should monitor closely. They are:

  1. Log in to the ESXi Server using an SSH client (Putty).
  2. Type esxtop and hit Enter.

  1. Monitor the preceding values to understand CPU overcommitment.

This example uses esxtop to detect CPU overcommitment. Looking at the pCPU line near the top of the screen, you can determine that this host's two CPUs are 100 percent utilized. Four active VMs are shown, Res-Hungry-1 to Res-Hungry-4. These VMs are active because they have relatively high values in the %USED column. The values in the %USED column alone do not necessarily indicate that the CPUs are overcommitted. In the %RDY column, you see that the three active VMs have relatively high values. High %RDY values plus high %USED values are a sure indicator that your CPU resources are overcommitted.

From the CPU view, navigate to a VM and press the E key to expand the view. It will give a detailed vCPU view for the VM. This is important because, at a quick level, CPU that is ready as a metric is best referenced when looking at performance concerns more broadly than a specific VM. If there is high ready percentage noted, contention could be an issue, particularly if other VMs show high utilization when more vCPUs than physical cores are present. In that case, other VMs could lead to high ready time on a low idle VM. So, long story short, if the CPU ready time is high on VMs on a host, it's time to verify that no other VMs are seeing performance issues.

You can also use the vCenter performance chart to spot CPU overcommitment, as follows:

  1. Log in to the vCenter Server using vSphere Web Client.
  2. On the home screen, navigate to Hosts and Clusters.
  3. Expand the left-hand navigation list.
  4. Navigate to one of the ESXi hosts.
  5. Navigate to the Monitor tab.
  6. Navigate to the Performance tab.
  7. Navigate to the Advanced view.
  8. Click on Chart Options.
  9. Navigate to CPU from Chart metrics.
  10. Select only Used and Ready in the Counters section and click on OK:

Now you will see the ready time and the used time in the graph and you can spot the overcommitment. The following screenshot is an example output:

The following example shows that the host has high ready time:

How it works...

Although high ready time typically signifies CPU contention, the condition does not always warrant corrective action. If the value of ready time is also accompanied by high used time, then it might signify that the host is overcommitted.

So the used time and ready time of a host might signal contention. However, the host might not be overcommitted due to workload availability.

There might be periods of activity and periods that are idle. So the CPU is not over-committed all the time. Another very common source of high ready time for VMs, even when pCPU utilization is low, is due to storage being slow. A vCPU, which occupies a pCPU, can issue a storage I/O and then sit in the WAIT state on the pCPU, blocking other vCPUs. Other vCPUs accumulate ready time; this vCPU and pCPU accumulate wait time (which is not part of the used or utilized time).