Inference Pipeline Deployment | LLM Engineer's Handbook

Book Overview & Buying
Table Of Contents

LLM Engineer's Handbook

By : Paul Iusztin, Maxime Labonne

4.8 (25)

Buy this Book

LLM Engineer's Handbook

4.8 (25)

By: Paul Iusztin, Maxime Labonne

Buy this Book

Overview of this book

Artificial intelligence has undergone rapid advancements, and Large Language Models (LLMs) are at the forefront of this revolution. This LLM book offers insights into designing, training, and deploying LLMs in real-world scenarios by leveraging MLOps best practices. The guide walks you through building an LLM-powered twin that’s cost-effective, scalable, and modular. It moves beyond isolated Jupyter notebooks, focusing on how to build production-grade end-to-end LLM systems. Throughout this book, you will learn data engineering, supervised fine-tuning, and deployment. The hands-on approach to building the LLM Twin use case will help you implement MLOps components in your own projects. You will also explore cutting-edge advancements in the field, including inference optimization, preference alignment, and real-time data processing, making this a vital resource for those looking to apply LLMs in their projects. By the end of this book, you will be proficient in deploying LLMs that solve practical problems while maintaining low-latency and high-availability inference capabilities. Whether you are new to artificial intelligence or an experienced practitioner, this book delivers guidance and practical techniques that will deepen your understanding of LLMs and sharpen your ability to implement them effectively. *Email sign-up and proof of purchase required

Preface

Who this book is for

What this book covers

To get the most out of this book

Get in touch

Making the Most Out of This Book – Get to Know Your Free Benefits

Free Chapter

Understanding the LLM Twin Concept and Architecture

Understanding the LLM Twin concept

Planning the MVP of the LLM Twin product

Building ML systems with feature/training/inference pipelines

Designing the system architecture of the LLM Twin

Summary

References

Tooling and Installation

Python ecosystem and project installation

MLOps and LLMOps tooling

Databases for storing unstructured and vector data

Preparing for AWS

Summary

References

Join our book’s Discord space

Data Engineering

Designing the LLM Twin’s data collection pipeline

Gathering raw data into the data warehouse

Summary

References

RAG Feature Pipeline

Understanding RAG

An overview of advanced RAG

Exploring the LLM Twin’s RAG feature pipeline architecture

Implementing the LLM Twin’s RAG feature pipeline

Summary

References

Join our book’s Discord space

Supervised Fine-Tuning

Creating an instruction dataset

Creating our own instruction dataset

Exploring SFT and its techniques

Fine-tuning in practice

Summary

References

Fine-Tuning with Preference Alignment

Understanding preference datasets

Creating our own preference dataset

Preference alignment

Implementing DPO

Summary

References

Join our book’s Discord space

Evaluating LLMs

Model evaluation

RAG evaluation

Evaluating TwinLlama-3.1-8B

Summary

References

Inference Optimization

Model optimization strategies

Model parallelism

Model quantization

Summary

References

Join our book’s Discord space

RAG Inference Pipeline

Understanding the LLM Twin’s RAG inference pipeline

Exploring the LLM Twin’s advanced RAG techniques

Implementing the LLM Twin’s RAG inference pipeline

Summary

References

Inference Pipeline Deployment

Criteria for choosing deployment types

Understanding inference deployment types

Monolithic versus microservices architecture in model serving

Exploring the LLM Twin’s inference pipeline deployment strategy

Deploying the LLM Twin service

Autoscaling capabilities to handle spikes in usage

Summary

References

Join our book’s Discord space

MLOps and LLMOps

The path to LLMOps: Understanding its roots in DevOps and MLOps

Deploying the LLM Twin’s pipelines to the cloud

Adding LLMOps to the LLM Twin

Summary

References

MLOps Principles

1. Automation or operationalization

2. Versioning

3. Experiment tracking

4. Testing

5. Monitoring

6. Reproducibility

Other Books You May Enjoy

Index

LLM Engineer's Handbook

By : Paul Iusztin, Maxime Labonne

LLM Engineer's Handbook

By: Paul Iusztin, Maxime Labonne

Overview of this book

Understanding inference deployment types

Confirmation

Buy this book with your credits?

Submit Your Feedback

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access