Retrieval Post-Processing | RAG from First Principles

Book Overview & Buying
Table Of Contents

RAG from First Principles

By : Jia Huang

Buy this Book

RAG from First Principles

By: Jia Huang

Buy this Book

Overview of this book

Most developers can spin up a RAG pipeline in an afternoon using LangChain or LlamaIndex. Far fewer understand why retrieval fails or how to fix it. This book is for those who want to go deeper. RAG From First Principles dismantles the retrieval-augmented generation stack layer by layer, explaining how documents are ingested and parsed, why chunking strategy directly impacts answer quality, how embedding models encode meaning, what happens inside a vector database, and how sparse and dense retrieval interact in a hybrid system. Written by Jia Huang, a research engineer and bestselling AI author, it brings both research depth and production experience to one of AI's most critical engineering disciplines. Structured as a progressive dialogue between a seasoned engineer and two students, the book surfaces the questions practitioners actually ask. Each chapter builds on the last, covering topics from data import and chunking to embedding selection, index design, hybrid search, and post-retrieval processing, before moving on to response generation, evaluation, and advanced paradigms including GraphRAG, Agentic RAG, and Modular RAG. By the end, you'll have the architectural understanding to optimize, debug, and extend your RAG systems with confidence. *Email sign-up and proof of purchase required

Preface

Who this book is for

What this book covers

Get in touch

Free benefits with your book

How to unlock

Join our Discord and Reddit Space

Free Chapter

Data Import

Does the parsing process depend on file type?

Reading simple text with a DataLoader

Parsing specific elements with the JSON loader

Reading text from images

Importing table data in CSV format

Crawling and parsing web documents

Understanding the cultural and story background of Black Myth: Wukong

Markdown file titles and structure

Text formats, layout recognition, and table parsing in PDF files

Summary

Get this book’s PDF version and more

Text Chunking

Why chunking is very important

Context window limits the maximum chunk length

Different chunking strategies

Using the unstructured tool for document structure-based chunking

Using LlamaIndex SemanticSplitterNodeParser for semantic chunking

Advanced indexing techniques related to chunking

Workflow of HyDE technology

Summary

Subscribe for a free eBook

Information Embedding

Embedding is the encoding of external information

Measuring similarity between vectors

From early word embedding models to large model embeddings

Modern embedding models: OpenAI, jina, cohere, voyage

Sparse embedding, dense embedding, and BM25

Multimodal embedding model visualized_BGE

Using embedding models with frameworks like LangChain and LlamaIndex

Fine-tuning embedding models

Summary

Get this book’s PDF version and more

Vector Storage

How are vectors stored

Components of a vector database

Indexing in vector databases

Vector retrieval (similarity measurement)

Mainstream vector databases

Selection and evaluation of vector databases

Index and search settings in vector databases

Choosing the appropriate index type

Selecting the appropriate metric

Search and query: Two retrieval methods

Using Milvus for hybrid search

Vector databases and multimodal retrieval

Using ResNet-34 to extract image features and perform retrieval

Data maintenance and vector storage CRUD operations in RAG systems

Summary

Pre-Retrieval Processing

Why pre-retrieval processing matters

Techniques included in pre-retrieval processing

Query construction: Asking questions in natural language

Text-to-SQL: Transforming natural language into SQL

Text-to-cypher: From natural language to graph database queries

Self-query retriever automatically generating metadata filter conditions from queries

Query translation: Better explaining user questions

Query routing finding the right data source

Get this book’s PDF version and more

Index Optimization

From small to large node-sentence sliding window and parent-child text chunks

From summary to detail: building summary-to-detail indexes with IndexNode and RecursiveRetriever

Hierarchical merging HierarchicalNodeParser and RAPTOR

Forward/backward linking connecting related nodes by forward/backward extension

Hybrid retrieval improving retrieval accuracy and expanding coverage

Summary

Subscribe for a free eBook

Retrieval Post-Processing

Common retrieval post-processing techniques

Difference between LangChain and LlamaIndex recency weighting

Compression

Remembering long context through prompt caching

Correction

Summary

Get this book’s PDF version and more

Response Generation

Improving LLM output quality by refining prompts

Summary

Subscribe for a free eBook

System Evaluation

Evaluation system for RAG systems

Evaluation frameworks for RAG systems

Summary

Get this book’s PDF version and more

Complex RAG Paradigms

An overview of complex RAG paradigms

Agentic RAG or agent-driven RAG systems

Multimodal RAG

Summary

Subscribe for a free eBook

Unlock Your Exclusive Benefits

Unlock this Book’s Free Benefits in 3 Easy Steps

Other Books You May Enjoy

Index

RAG from First Principles

By : Jia Huang

RAG from First Principles

By: Jia Huang

Overview of this book

Compression

Contextual compression retrievers

Confirmation

Buy this book with your credits?

Submit Your Feedback

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access