vSphere High Performance Cookbook

vSphere High Performance Cookbook

Overview of this book

VMware vSphere is the key virtualization technology in today's market. vSphere is a complex tool and incorrect design and deployment can create performance-related problems. vSphere High Performance Cookbook is focused on solving those problems as well as providing best practices and performance-enhancing techniques. vSphere High Performance Cookbook offers a comprehensive understanding of the different components of vSphere and the interaction of these components with the physical layer which includes the CPU, memory, network, and storage. If you want to improve or troubleshoot vSphere performance then this book is for you! vSphere High Performance Cookbook will teach you how to tune and grow a VMware vSphere 5 infrastructure. This book focuses on tuning, optimizing, and scaling the infrastructure using the vSphere Client graphical user interface. This book will enable the reader with the knowledge, skills, and abilities to build and run a high-performing VMware vSphere virtual infrastructure. You will learn how to configure and manage ESXi CPU, memory, networking, and storage for sophisticated, enterprise-scale environments. You will also learn how to manage changes to the vSphere environment and optimize the performance of all vSphere components. This book also focuses on high value and often overlooked performance-related topics such as NUMA Aware CPU Scheduler, VMM Scheduler, Core Sharing, the Virtual Memory Reclamation technique, Checksum offloading, VM DirectPath I/O, queuing on storage array, command queuing, vCenter Server design, and virtual machine and application tuning. By the end of this book you will be able to identify, diagnose, and troubleshoot operational faults and critical performance issues in vSphere.

vSphere High Performance Cookbook

Credits

About the Author

About the Reviewers

www.PacktPub.com

Preface

Free Chapter

CPU Performance Design

Introduction

Critical performance consideration – VMM scheduler

CPU scheduler – processor topology/cache aware

Ready time – warning sign

Hyperthreaded core sharing

Spotting CPU overcommitment

Fighting guest CPU saturation in SMP VMs

Controlling CPU resources using resource settings

What is most important to monitor in CPU performance

CPU performance best practices

Memory Performance Design

Introduction

Virtual memory reclamation techniques

Monitoring host-swapping activity

Monitoring host-ballooning activity

Keeping memory free for VMkernel

Key memory performance metrics to monitor

What metrics not to use

Identifying when memory is the problem

Analyzing host and VM memory

Memory performance best practices

Networking Performance Design

Introduction

Designing a network for load balancing and failover for vSphere Standard Switch

Designing a network for load balancing and failover for vSphere Distributed Switch

What to know when offloading checksum

Selecting the correct virtual network adapter

Improving performance through VMDirectPath I/O

Improving performance through NetQueue

Improving network performance using the SplitRx mode for multicast traffic

Designing a multi-NIC vMotion

Improving network performance using network I/O control

Monitoring network capacity and performance matrix

DRS, SDRS, and Resource Control Design

Introduction

Using DRS algorithm guidelines

Using resource pool guidelines

Avoiding using resource pool as folder structure

Choosing the best SIOC latency threshold

Using storage capability and profile driven storage

Anti-affinity rules in the SDRS cluster

Avoiding the use of SDRS I/O Metric and array-based automatic tiering together

Using VMware SIOC and array-based automatic tiering together

vSphere Cluster Design

Introduction

Trade-off factors while designing scale up and scale out clusters

Using VM Monitoring

vSphere Fault Tolerance design and its impact

DPM and its impact

Choosing the reserved cluster failover capacity

Rightly choosing the vSphere HA cluster size

Storage Performance Design

Introduction

Designing the host for a highly available and high-performing storage

Designing a highly available and high-performance iSCSI SAN

Designing a highly available and high-performing FC storage

Performance impact of queuing on the storage array and host

Factors that affect storage performance

Using VAAI to boost storage performance

Selecting the right VM disk type

Monitoring command queuing

Identifying a severely overloaded storage

Designing vCenter and vCenter Database for Best Performance

Introduction

vCenter Single Sign-On and its database preparation

vCenter Single Sign-On and its deployment

Things to bear in mind while designing the vCenter platform

Designing vCenter Server for redundancy

Designing a highly available vCenter database

vCenter database size and location affects performance

Considering vCenter Server Certificates to minimize security threats

Designing vCenter Server for Auto Deploy

Virtual Machine and Application Performance Design

Introduction

Setting the right time in Guest OS

vNUMA (Virtual NUMA) considerations

Choosing the SCSI controller for storage

Impact of VM swap file placement

Using large pages in virtual machines

Guest OS networking considerations

When you should or should not virtualize an application

Measuring the application's performance

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Ready time – warning sign

To achieve the best performance in a consolidated environment, you must consider a ready time.

Ready time is the time that the vCPU waits, in the queue, for the pCPU (or physical Core) to be ready to execute its instruction. The scheduler handles the queue and when there is contention, and the processing resources are stressed, the queue might become long.

The ready time describes how much of the last observation period a specific world spent waiting in the queue. The ready time for a particular world (for example, a vCPU) is how much time during that interval was spent waiting in the queue to get access to a pCPU. The ready time can be expressed in percentage per vCPU over the observation time and statistically it can't be zero on average.

The value of the ready time, therefore, is an indicator of how long the VM was denied access to the pCPU resources which it wanted to use. This makes it a good indicator of performance.

When multiple processes are trying to use the same physical CPU, that CPU might not be immediately available, and a process must wait before the ESXi host can allocate a CPU to it.

The CPU scheduler manages access to the physical CPUs on the host system. A short spike in CPU used or CPU ready indicates that you are making the best use of the host resources. However, if both values are constantly high, the hosts are probably overloaded and performance is likely poor.

Generally, if the CPU used value for a virtual machine is above 90 percent and the CPU ready value is above 20 percent per vCPU (high number of vCPUs), performance is negatively affected.

This latency may impact the performance of the guest operating system and the running applications within a virtual machine.

Getting ready

To step through this recipe, you need a running ESXi Server, a couple of CPU-hungry virtual machines, VMware vCenter Server, and a working installation of vSphere Client. No other prerequisites are required.

How to do it...

Let's get started:

Open up vSphere Client.
Log in to the VMware vCenter Server.
On the home screen, navigate to Hosts and Clusters.
Expand the left-hand navigation list.
Navigate to one of the CPU-hungry virtual machines.
Navigate to the Performance screen.
Navigate to the Advanced view.
Click on Chart Options.
Navigate to CPU from the Chart metrics.
Navigate to the VM object.
1. Select only Demand, Ready, and Usage in MHz.
  The key metrics when investigating a potential CPU issue are:
- Demand: Amount of CPU that the virtual machine is trying to use.
- Usage: Amount of CPU that the virtual machine is actually being allowed to use.
- Ready: Amount of time for which the virtual machine is ready to run but (has work it wants to do) but was unable to because vSphere could not find physical resources to run the virtual machine on.
Click on Ok.

In the following screenshot you will see the high ready time for the virtual machine:

Notice the amount of CPU this virtual machine is demanding and compare that to the amount of CPU usage the virtual machine is actually being able to get (usage in MHz). The virtual machine is demanding more than it is currently being allowed to use.

Notice that the virtual machine is also seeing a large amount of ready time.

Note

Ready time greater than 10 percent could be a performance concern. However, some less CPU-sensitive applications and virtual machines can have much higher values of ready time and still perform satisfactorily.

How it works...

Bad performance is when the users are unhappy. But that's subjective and hard to measure. We can measure other metrics easily, but they don't correlate perfectly with whether user's expectations are met. We want to find metrics that correlate well (though never perfectly) with user satisfaction. It's always the case that the final answer to "Is there a performance problem?" is subjective, but we can use objective metrics to make reasonable bets, and decide when it's worth asking the users if they're satisfied with the performance.

A vCPU is in ready state when the vCPU is ready to run (that is, it has a task it wants to execute) but is unable to run because the vSphere scheduler is unable to find physical host CPU resources to run the virtual machine on. One potential reason for elevated ready time is that the virtual machine is constrained by a user-set CPU limit or resource pool limit, reported as max limited (MLMTD). The amount of CPU denied because of a limit is measured as the metric max limited (MLMTD).

Ready time is reported in two different values between resxtop/esxtop and vCenter Server. In resxtop/esxtop, it is reported in an easily-understood percentage format. A figure of 5 percent means that the virtual machine spent 5 percent of its last sample period waiting for available CPU resources (only true for 1-vCPU VMs). In vCenter Server, ready time is reported as a time measurement. For example, in vCenter Server's real-time data, which produces sample values every 20,000 milliseconds, a figure of 1,000 milliseconds is reported for a 5 percent ready time. A figure of 2,000 milliseconds is reported for a 10 percent ready time.

Tip

As you may know that vCenter reports ready time in milliseconds (ms), use the following formula to convert the ms value to a percentage:

                                                 Metric Value (In Millisecond)
Metric Value (In Percent) = ------------------------------------------------	x 100
                                                 Total Time of Sample Period
                            (By default 20000 ms in vCenter for real-time graphs)

Although high ready time typically signifies CPU contention, the condition does not always warrant corrective action. If the value for ready time is close in value to the amount of time used on the CPU, and if the increased ready time occurs with occasional spikes in CPU activity but does not persist for extended periods of time, this might not indicate a performance problem. The brief performance hit is often within the accepted performance variance and does not require any action on the part of the administrator.

vSphere High Performance Cookbook

vSphere High Performance Cookbook

Overview of this book

Related Content you might be interested in

Current Title:

vSphere High Performance Cookbook

Ready time – warning sign

Getting ready

How to do it...

Note

How it works...

Tip