Before you jump to conclusions as to what to monitor in CPU performance, you need to make sure that you know what affects CPU performance. Things that can affect CPU performance include:
- CPU affinity: When you pin down a virtual CPU to a physical CPU, it may happen that your resource gets imbalanced. So this is not advised, until you have a strong reason to do that.
- CPU prioritization: When CPU contention happens, the CPU scheduler will be forced to prioritize VMs based on entitlement and queue requests.
- SMP VMs: If your application is not multithreaded, then there is no benefit in adding more CPU resources in VMs. In fact, the extra idle vCPUs add overhead that prevents some more useful work from being done.
- Idle VMs: You may have too many idle VMs, which you think should not eat up resources. However, in reality, However, in reality, even idle VMs can affect CPU performance if the VM shares or reservations have been changed from their default values.
So, now you know what affects CPU performance. You can now look at what it takes to monitor it.
You can categorize the factors that should be monitored for CPU performance into three main sections:
- Host CPU usage
- VM CPU usage
- VM CPU ready time
To monitor these sections, you need to know the esxtop counters, and they are:
PCPU Used (%)
- Per group statistics:
%Used
%Sys
%RDY
%Wait
%CSTP
%MLMTD
To step through this recipe, you need a running ESXi Server with SSH enabled, a couple of running CPU-hungry VMs, and an SSH client (Putty). No other prerequisites are required.
Let's get started:
- Log in to the ESXi host using an SSH client (Putty).
- Run
esxtop
and monitor the statistics. The following screenshot is an example output:
- Now, look at the performance counters as mentioned previously. In the following example output, look at the different metrics:
In the preceding example, you can see our PCPU
0 and PCPU
1 are being used heavily (100
percent and 73
percent UTIL
, respectively), and it shows the following figure:
Now in the preceding example, you see that the %Used
value of the four CPU-hungry virtual machines is pretty high.
Also, look at the %RDY
screen and you will see high ready time, which indicates a performance problem.
The following list is a quick explanation of each of these metrics:
PCPU USED (%)
: This refers to the CPU utilization per physical CPU.%USED
: This is the physical CPU usage per group.%SYS
: This is the VMkernel system's activity time.%RDY
: This is the ready time. It is referred to as the amount of time that the group spent ready to run but waiting for the CPU to be available. Note that this is not adjusted for the number of vCPUs. You should expand the group to see%Ready
for each vCPU, or at least divide this by the number of vCPUs to use an average per vCPU.%WAIT
: This is the percentage of time spent in the blocked or busy state. It includes idle time and also the time waiting for I/O from the disk or network.%CSTP
: This is referred to as the percentage of time spent in VMkernel on behalf of the group for processing interrupts.%CSTP
for a vCPU indicates how much time the vCPU has spent not running in order to allow extra vCPUs in the same VM to catch up. High values suggest that this VM has more vCPUs than it needs and the performance might be suffering.%MLMTD
: This is the amount of time spent ready to run, but not scheduled because of a CPU limit.