-
Book Overview & Buying
-
Table Of Contents
Kubernetes for Generative AI Solutions
By :
This chapter will cover strategies to maximize graphics processing unit (GPU) (https://aws.amazon.com/what-is/gpu/) efficiency in K8s when deploying GenAI applications as GPU instances are very expensive and often underutilized. We will also cover GPU resource management, scheduling best practices, and partitioning options such as Multi-Instance GPU (MIG) (https://www.nvidia.com/en-us/technologies/multi-instance-gpu/), Multi-Process Service (MPS) (https://docs.nvidia.com/deploy/mps/index.html), and GPU time-slicing (https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/gpu-sharing.html). Finally, we’ll discuss monitoring GPU performance, balancing workloads across nodes, and auto-scaling GPU resources to handle dynamic GenAI workloads effectively.
In this chapter, we’re going to cover the following main topics: