Chapter 10: Optimizing GPU Resources for GenAI Applications in Kubernetes

Book Overview & Buying
Table Of Contents

Kubernetes for Generative AI Solutions

By : Ashok Srirama, Sukirti Gupta

Buy this Book

Kubernetes for Generative AI Solutions

By: Ashok Srirama, Sukirti Gupta

Buy this Book

Overview of this book

Generative AI (GenAI) is revolutionizing industries, from chatbots to recommendation engines to content creation, but deploying these systems at scale poses significant challenges in infrastructure, scalability, security, and cost management. This book is your practical guide to designing, optimizing, and deploying GenAI workloads with Kubernetes (K8s) the leading container orchestration platform trusted by AI pioneers. Whether you're working with large language models, transformer systems, or other GenAI applications, this book helps you confidently take projects from concept to production. You’ll get to grips with foundational concepts in machine learning and GenAI, understanding how to align projects with business goals and KPIs. From there, you'll set up Kubernetes clusters in the cloud, deploy your first workload, and build a solid infrastructure. But your learning doesn't stop at deployment. The chapters highlight essential strategies for scaling GenAI workloads in production, covering model optimization, workflow automation, scaling, GPU efficiency, observability, security, and resilience. By the end of this book, you’ll be fully equipped to confidently design and deploy scalable, secure, resilient, and cost-effective GenAI solutions on Kubernetes.

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Get in touch

Share Your Thoughts

Stay Sharp in Cloud and DevOps – Join 44,000+ Subscribers of CloudPro

Free Benefits with Your Book

Free Chapter

Part 1:GenAI and Kubernetes Foundation

Chapter 1: Generative AI Fundamentals

Artificial Intelligence versus GenAI

Evolution of machine learning

Transformer architecture

GenAI project life cycle

GenAI deployment stack

GenAI use cases

Summary

Appendix 1A – RNNs: Appendix 1B – Transformer mathematical models for the self-attention mechanism

Chapter 2: Kubernetes – Introduction and Integration with GenAI

Understanding containers

What is Kubernetes (K8s)?

Summary

Chapter 3: Getting Started with Kubernetes in the Cloud

Advantages of running K8s in the cloud

Setting up a K8s cluster in the cloud

Deploying our first GenAI model in the K8s cluster

Summary

Part 2: Productionalizing GenAI Workloads Using K8s

Chapter 4: GenAI Model Optimization for Domain-Specific Use Cases

Technical requirements

The need for domain-specific optimization

LLM model selection

The LangChain framework

Understanding RAG

Model fine-tuning

Summary

Further reading

Chapter 5: Working with GenAI on K8s: Chatbot Example

Technical requirements

GenAI use cases for e-commerce

Experimentation using JupyterHub

Fine-tuning Llama 3 in K8s

Deploying the fine-tuned model on K8s

Deploy a RAG application on K8s

Deploying a chatbot on K8s

Summary

Chapter 6: Scaling GenAI Applications on Kubernetes

Scaling metrics

HorizonalPodAutoscaler (HPA)

VerticalPodAutoscaler (VPA)

KEDA

Cluster Autoscaler (CA)

Karpenter

Summary

Chapter 7: Cost Optimization of GenAI Applications on Kubernetes

Understanding the key cost components

Cost optimization techniques

Summary

Join the CloudPro Newsletter with 44000+ Subscribers

Chapter 8: Networking Best Practices for Deploying GenAI on K8s

Understanding the Kubernetes networking model

Advanced traffic management with a service mesh

Securing GenAI workloads with Kubernetes’ network policies

Optimizing network performance for GenAI

Summary

Chapter 9: Security Best Practices for Deploying GenAI on Kubernetes

Technical requirements

Defense in depth

K8s security considerations

Additional considerations for GenAI apps

Implementing security best practices in a chatbot app

Summary

Chapter 10: Optimizing GPU Resources for GenAI Applications in Kubernetes

Technical requirements

GPUs and custom accelerators

Allocating GPU resources in K8s

Understanding GPU utilization

Techniques for partitioning and sharing GPUs

Scaling and optimization considerations

Summary

Part 3: Operating GenAI Workloads on K8s

Chapter 11: GenAIOps: Data Management and the GenAI Automation Pipeline

Technical requirements

Overview of GenAI pipelines

GenAIOps on K8s

Data privacy, model bias, and drift monitoring

Summary

Join the CloudPro Newsletter with 44000+ Subscribers

Chapter 12: Observability – Getting Visibility into GenAI on K8s

Observability key concepts

Monitoring tools in K8s

Visualization and debugging

Summary

Chapter 13: High Availability and Disaster Recovery for GenAI Applications

Designing for HA and DR

DR strategies in K8s

Summary

Chapter 14: Wrapping Up: GenAI Coding Assistants and Further Reading

Technical requirements

GenAI-powered coding assistants

GenAI-powered observability and optimization

Amazon Q Developer walk-through with EKS

References for further reading

Summary

Stay Sharp in Cloud and DevOps – Join 44,000+ Subscribers of CloudPro

Chapter 15: Unlock Your Exclusive Benefits

Unlock this Book’s Free Benefits in 3 Easy Steps

Index

Why subscribe?

Other Books You May Enjoy

Packt is searching for authors like you

Share Your Thoughts

Kubernetes for Generative AI Solutions

By : Ashok Srirama, Sukirti Gupta

Kubernetes for Generative AI Solutions

By: Ashok Srirama, Sukirti Gupta

Overview of this book

Optimizing GPU Resources for GenAI Applications in Kubernetes

Confirmation

Buy this book with your credits?

Submit Your Feedback

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access