Book Image

Intelligent Automation with VMware

By : Ajit Pratap Kundan
Book Image

Intelligent Automation with VMware

By: Ajit Pratap Kundan

Overview of this book

This book presents an introductory perspective on how machine learning plays an important role in a VMware environment. It offers a basic understanding of how to leverage machine learning primitives, along with a deeper look into integration with the VMware tools used for automation today. This book begins by highlighting how VMware addresses business issues related to its workforce, customers, and partners with emerging technologies such as machine learning to create new, intelligence-driven, end user experiences. You will learn how to apply machine learning techniques incorporated in VMware solutions for data center operations. You will go through management toolsets with a focus on machine learning techniques. At the end of the book, you will learn how the new vSphere Scale-Out edition can be used to ensure that HPC, big data performance, and other requirements can be met (either through development or by fine-tuning guidelines) with mainstream products.
Table of Contents (20 chapters)
Title Page
About Packt
Contributors
Preface
Index

Modes of GPU usage


Applications running in virtual machines hosted on vSphere can make use of GPU processing power in two ways.

vSphere DirectPath I/O is a vSphere inherent feature that leverages virtualization technology (VT)-enabled processors configured to the hosts to enhance the performance of virtual machines. General purpose Input Output (GPIO) is a processor feature of Intel/AMD CPUs known as an I/O memory management unit that assigns direct memory access transfers and device interrupts. This way, virtual machines are able to bypass the VMkernel and get direct access to the underlying physical hardware. vMotion is supported with DPIO-enabled server hardware.

Hardware-assisted I/O MMU virtualization is called Intel Virtualization Technology for Directed I/O (VT-d) in Intel processors and AMD I/O Virtualization (AMD-Vi or IOMMU) in AMD processors. It is a function of the chipset that assists virtual machines to get direct access to hardware I/O devices such as network cards, storage controllers, and GPUs.

NVIDIA GRID GPUs support vGPU, which is the capability for multiple users to share a single physical GPU in a virtualized environment. There are three types of hardware-based graphics acceleration configurations possible for Horizon View virtual desktops. vGPU offers the best performance and compatibility options.

Comparing ML workloads to GPU configurations

We can compare the same ML workload by testing it with three different GPU configurations; these are as follows:

  • GPU using DirectPath I/O on vSphere
  • GRID vGPU on vSphere
  • Native GPU on bare metal host

We have tested and found that the virtualization layer (DirectPath I/O and GRID vGPU) introduced only a 4% overhead for the tested ML application. Learning times can be compared to the specific model by using two virtual machines with different configurations.

VM resources along with OS of two VMs with and without GPU:

  • NVIDIA GRID Configuration: 1 vGPU, 12 vCPUs, 60 GB memory, 96 GB of SSD storage, CentOS 7.2
  • No GPU configuration: No GPU, 12 vCPUs, 60 GB memory, 96 GB of SSD storage, CentOS 7.2

Let's look at the following table:

MNIST workload

1 vGPU (sec)

No GPU (sec)

Normalized learning time

1.1

10.01

CPU utilization

9%

45%

 

vGPU reduces the training time by 10 times and CPU utilization also goes down 5 times as shown in the preceding table. ML can be referenced with two components; these are as follows:

  • The convolutional neural network model derived from the TensorFlow library.
  • The Canadian Institute For Advanced Research (CIFAR)-10 dataset has defined images datasets, which we utilize in ML and IT vision algorithms.

DirectPath I/O 

First, we focus on DirectPath I/O (DPIO) passthrough mode as we scale from one GPU to four GPUs:

CIFAR-10

 1 GPU

2 GPUs

4 GPUs

Normalized images/sec in thousands (w.r.t. 1 GPU)

1.1

2.01

3.77

CPU utilization

23%

41%

73%

Images processed per second get better with the increased number of GPUs on the server. One GPU almost used to normalized data at 1,000 images/second and will grow further with the increase of GPUs. DPIO and GRID vGPU mode performance can be compared by configuring with one vGPU/VM in both modes:

MNIST Workload

(lower is better)

DPIO

GRID vGPU

Normalized training times

1.1

1.03

CIFAR-10 Workload

(Higher is better)

DPIO

GRID vGPU

Normalized images/second

1.1

0.83

 

DPIO and GRID mode vGPU have more-or-less the same performance as one vGPU/VM. We can configure a VM with all the available GPUs on the host in DPIO, but a VM can configure a maximum of one GPU in GRID vGPU mode. We can differentiate between four VMs running the same job and a VM using four GPUs/hosts in DPIO mode:

CIFAR-10 Workload

DPIO

DPiO (four VMs)

GRID vGPU (four VMs)

Normalized images/second (higher is better)

1.1

0.96

0.94

CPU utilization

73%

69%

67%

 

We should configure virtual machines with low latency or require a shorter training time in multi-GPU DPIO mode. As they are dedicated to specific virtual machines, the rest of the virtual machines will not be able to access the GPUs on the host during this time. We can leverage virtual machines with longer latencies or learning times by configuring 1-GPU in GRID vGPU mode and enjoy the virtualization benefits.

Scalability of GPU in a virtual environment

Horizon and vSphere support vGPU, and vGPU brings the benefit of broad API support and native NVIDIA drivers with maximum scalability. NVIDIA GRID GPUs are based on the NVIDIA Kepler GPU architecture. NVIDIA GRID GPUs support vGPU capability for multiple users to share a single physical GPU in a virtualized environment. Horizon will automatically load-balance vGPU-enabled virtual desktops across compute and storage resource pools with the required GPUs, even with different pools using various user profiles. If we create two linked-clone pools, one with a K120Q profile and another with K220Q, Horizon will put the first profile on hosts with K1 cards and the latter on K2 without any effort. vGPU profiles entitle dedicated graphics memory. The GPU manager allocates memory size to meet the specific asks of each user.

The ESXi host can go up to a maximum 16 physical GPU-based graphics to be shared among different virtual machines/users.

Horizon have three kinds of graphics acceleration:

  • Virtual shared graphics
  • Virtual shared passthrough graphics
  • Virtual dedicated graphics
  • Total memory (including volatile and non-volatile memory) can't exceed the maximum memory limit (6,128 GB) per virtual machine

Containerized ML applications inside a VM

The vSphere Integrated Containers architecture gives two container deployment models:

  1. Virtual container hosts: vSphere Integrated Containers leverages the native constructs of vSphere to provision containers. It extends the availability and performance capabilities (DRS, HA, vMotion ) of vSphere to containerized workloads. A container image can be used as a virtual machine, and developers can also consume it as a Docker API.
  2. Docker container hosts: Developers can self-provision Docker container hosts on demand and use them as a development sandbox to repackage apps. This architecture complements agile development practices and DevOps methodologies such as continuous integration (CI) and continuous deployment (CD).

It will be costly and time consuming to re-architect an in-house application that is tightly coupled to its data and other application components/logic, so it cuts costs to repackage the application in a container without changing the application’s design. The learning curve for repackaging an application is small.

vSphere Integrated Containers gives an option to instantiate a Docker image by using the Docker command-line interface and then deploying the container image as a VM instead of as a container on top of a Docker host, so we can get the benefits of packaging the application as a container without re-architecting it. This way, we keep the isolation of VMs. vSphere Integrated Containers is the ideal solution for application repackaging without any new infrastructure/dedicated hardware or the need to implement new tools.

The repackaged containerized application can run alongside other virtual machines running traditional or containerized applications. vSphere Integrated Containers have high availability at the infrastructure level without developer intervention to support the repackaged container. We can also utilize core vSphere features such as vSphere high availability and vSphere vMotion.

vGPU scheduling and vGPU profile selection

GPUs support by default an equal share and have to configure a fixed share as per a customer’s requirement.

We can configure GPU with the following two options:

  • Equal share scheduler: A physical GPU is shared among all the virtual desktops as vGPUs running on the same host. The share of processing cycles changes as vGPUs are added/removed to a GPU and the performance of a vGPU depends on whether other vGPUs are running or stopping.
  • Fixed share scheduler: Each vGPU is allocated a fixed share of the physical GPU's processing cycles in spite of whether vGPUs are added to or removed from a GPU. It will be constant, even if other vGPUs are running or stopping.

NVIDIA GRID vGPU on vSphere can be configured with various options for the vGPU profile that defines the GPU memory each VM can use with the maximum number of VMs that can share a single GPU.

vGPU profiles provides a line up of virtual GPUs with different buffer memory frame sizes and numbers of heads. The number of users will be defined by the division of a frame buffer per GPU attached to a specific profile, and the number of heads denotes the supported number of displays, while the maximum resolution will be consistent across all the profiles. vGPU profiles ending in Q have to follow an application certification process the same as the NVIDIA Quadro cards for professional graphics applications. We can get 100% compatibility and performance with these applications. You can refer to this link for a list of certified applications: https://www.nvidia.com/en-us/design-visualization/solutions/virtualization/.

Power user and designer profiles 

We can move our most demanding end users into the data center with NVIDIA Grid and Horizon. We can help these users with mobility, easy management, centralized data and security, disaster recovery protection, and other benefits of virtualization. We can bind these users with their workstation by subsequentially chaining them to a desk. Although Virtual Dedicated Graphics Acceleration (vDGA) passthrough allows remote workstation access with a 1:1 ratio with a higher cost and without any optimization of resources, now, we can have mixed workstation users along with task/knowledge users for better resource optimization. We are getting lots of options for designing a solution with the desired compatibility and performance. We can get high-quality experience with a design application on certified software and hardware by utilizing the NVIDIA platform. Profile selection depends on the primary application’s requirements, and based on these requirements we can choose the suitable Quadro-certified vGPU profile to achieve the end user’s requirements.

Knowledge and task user profiles

Task workers mostly need Soft 3D, a software-based 3D renderer good for less graphics-intensive applications. They do not require, or get a noticeable advantage from, hardware-based 3D acceleration. Soft 3D is a standard component of Horizon.

Office workers and executives come into this profile, mostly using applications such as Microsoft Office, Adobe Photoshop, and other non-specialized end-user applications. A Virtual Shared Graphics Acceleration (vSGA) solution can optimize performance for this use case by providing high levels of consolidation of users across GPUs. vSGA does not provide a broad range of graphics API support, as it is always better to consider a vGPU-based solution for knowledge workers.

Adding vGPU hosts to a cluster with vGPU Manager

We have to install the NVIDIA GPU manager vSphere Installation Bundle (VIB), as NVIDIA VIB has drivers that are a must for the host to identify the GPU. This will give you vGPU Manager. ESXi host's BIOS power and performance settings should be set to the high performance policy before installing the supported version of vCenter and ESXi. ESXi hosts are managed through vCenter and configured with NTP and DNS.

 vGPU Manager VIB is loaded the same as a driver in the hypervisor. vGPU Manager can provision up to eight users to share each physical GPU. M60 can be set up to 32 users per card. This cluster must have hosts that have NVIDIA Tesla M60 vGPU. This is to optimize the distribution of resources for the GPU.