-
Book Overview & Buying
-
Table Of Contents
RAG from First Principles
By :
Retrieval-augmented generation (RAG) has rapidly become one of the most important approaches for building reliable and intelligent AI systems. By combining large language models with external knowledge retrieval, RAG enables applications to generate responses that are more accurate, contextual, and grounded in enterprise data. In this book, you will explore the complete RAG pipeline from first principles, beginning with the foundations of embeddings, vector storage, and vector databases, before moving into advanced retrieval optimization and response generation strategies. The book explains not only how RAG systems work conceptually, but also how they are implemented in practical, production-ready environments.
You will first learn how vector representations are created and managed, including how embeddings are stored, indexed, and retrieved efficiently using vector databases such as Milvus and frameworks like LlamaIndex. The book explains the architecture of vector storage systems, indexing methods such as FLAT and IVF, and the trade-offs involved in similarity search and large-scale retrieval. From there, you will discover pre-retrieval processing techniques, including query construction, query translation, Text-to-SQL workflows, metadata filtering, and query routing, enabling natural language questions to interact seamlessly with structured and unstructured data sources.
Once the retrieval foundations are established, the book focuses on improving retrieval quality through index optimization strategies. You will learn how to design more accurate retrieval pipelines using sentence-window retrieval, parent-child chunking, hierarchical indexing, and context-expansion techniques with both LlamaIndex and LangChain. These chapters emphasize practical engineering decisions that improve retrieval precision while preserving sufficient context for generation. Through detailed code examples and architectural explanations, you will understand how to balance chunk granularity, contextual recall, and scalability when building high-quality RAG applications.
Finally, the book explores the response generation stage of RAG systems, covering prompt engineering, structured output parsing, factuality improvement, and generation control techniques. You will learn how to guide large language models using templates, examples, fact-checking strategies, and structured parsers in LangChain and LlamaIndex. The book also discusses the selection of generation models, the use of APIs and locally deployed models, and advanced optimization strategies such as Self-RAG and iterative refinement approaches. By the end of this book, you will be able to design, optimize, and deploy end-to-end RAG systems that integrate retrieval, reasoning, and generation into scalable AI applications suitable for real-world enterprise use cases.