-
Book Overview & Buying
-
Table Of Contents
Building Natural Language and LLM Pipelines
By :
Between 2023 and 2025, the release of OpenAI’s large language models (LLMs) as REST endpoints captivated professionals across industries with their ability to understand and respond to natural language. In 2024, we marveled at their ability to produce context-aware answers grounded in a corpus of documents. This approach, known as retrieval-augmented generation (RAG), quickly solidified itself as a cornerstone technique in modern AI.
As we experimented with this technology, we discovered that LLMs could do more than answer questions: they could be extended to use tools to solve problems. This unlocked new possibilities for software engineers, who began developing the tools necessary to enable what we now, in 2025, refer to as agents. While the capabilities introduced by RAG and agents represent an exciting step toward more capable artificial intelligence, the path to integrating agents into real-world systems remains fraught with challenges. Looking ahead to 2026 and beyond, this book argues that the focus will move beyond capability and center on reliability.
A recurring theme throughout the evolution from LLMs to RAG, and from RAG to agents is the presence of hallucinations, a phenomenon in which an LLM produces coherent-sounding responses that are false. This phenomenon is at the center of what this book refers to the “agentic reliability crisis of 2025”. In this book, we will prove that an agent powered by an LLM is only as robust as the data, tools, and context it is provided. We will show that reliability and systems integrity within an agentic system are not inherent properties of LLMs, but they are the result of careful engineering and systems design.
The central argument of this book is simple: the path to production-grade, trustworthy AI hinges on the rigorous application of classic data processing techniques. We introduce the tool vs. orchestration layer pattern, in which high-quality, robust, and scalable tools are developed as microservices, and agents are then equipped to use those tools through disciplined context engineering. This includes four core strategies: write, select, compress, and isolate. These strategies help us carefully manage the information the agent receives at each step of its problem-solving process, allowing it to efficiently and accurately resolve the task it was given.
By the end of this book, you will master two graph-based architectures and learn how to combine them to build highly dynamic, yet observable and resilient agentic systems:
This book has a strong focus on practical examples. You will find plenty of Jupyter notebooks, Python and shell scripts, and Docker container images to help you apply the concepts introduced in each chapter. Through a series of mini-projects, you will progressively build expertise, and you will conclude your journey with a blueprint for creating sovereign agents: fully owned AI that can run locally or on edge devices.
This book is for NLP engineers, LLM application developers, and data scientists looking for stable, testable building blocks for retrieval, summarization, and ranking. It is ideal for technical leads and architects designing production-grade LLM tools, as well as teams tasked with modernizing legacy NLP pipelines into robust RAG and agentic systems.
Chapter 1, Introduction to Natural Language Processing Pipelines, defines the agentic reliability crisis of 2025 and reframes classic data pipelines as the foundational reliability layer required for autonomous agents. It introduces text processing fundamentals like tokenization and embeddings as the prerequisite for building trustworthy AI.
Chapter 2, Diving Deep into Large Language Models, introduces context engineering as a formal discipline for managing information environments and traces the evolution of models from the 2023 baselines to the specialized reasoning engines of 2025.
Chapter 3, Introduction to Haystack by deepset, explores the explicit, graph-based architecture of Haystack 2.0 for building tools with strict data contracts.
Chapter 4, Bringing Components Together – Haystack Pipelines for Different Use Cases, demonstrates how to construct production-grade indexing, multimodal, and hybrid RAG pipelines to solve complex retrieval problems like "vocabulary mismatch".
Chapter 5, Haystack Pipeline Development with Custom Components, focuses on extending the Haystack framework by building specialized components, such as a knowledge graph generator and a synthetic test data generator from PDFs and scrapped websites.
Chapter 6, Building Reproducible and Production-Ready RAG Systems, details how to transition from experimentation to engineering using Docker and uv, while implementing quantitative evaluation with RAGAS and observability with Weights & Biases.
Chapter 7, Deploying Haystack-Based Applications, compares deployment strategies using FastAPI for custom control and Hayhooks for rapid REST API generation from serialized YAML pipelines.
Chapter 8, Hands-On Projects, applies the tool vs. orchestration thesis introduced in this book through a series of projects for classic NLP, such as named-entity recognition, text classification, and sentiment analysis, culminating in the Yelp Navigator multi-agent system built with LangGraph and Haystack microservices.
Chapter 9, Future Trends and Beyond, explores the cutting edge of 2026, including hardware optimization with NVIDIA NIMs and emerging protocols like Model Context Protocol (MCP) and Agent-to-Agent (A2A).
Chapter 10, Epilogue: The Architecture of Agentic AI, provides a final synthesis of the book's journey, analyzing the evolution of agentic architectures through case studies by evaluating three different kinds of agentic architectures against token economics and system integrity when microservices fail.
To get the most out of this book, you should have a solid grasp for Python. Familiarity with core data science projects is recommended but not required. This book is written as a code-heavy, architecture-first guide designed for practitioners ready to transition from experimental scripts to architecting robust, containerized, and stateful agentic applications. It is not intended for non-technical users seeking no-code solutions or prompt hacks.
It is recommended that you use an IDE to interact with the code, such as VSCode, Cursor, or PyCharm. If you’re a Windows user, it is strongly recommended you use WLS (https://code.visualstudio.com/docs/remote/wsl). We will use uv for package management and a separate virtual environment is used for the exercises in each chapter. You will need to install Docker desktop (https://docs.docker.com/desktop/) to complete the exercises for Chapter 6 and Chapter 7. Whereas the Jupyter notebooks and scripts use OpenAI and as such require the use of an OpenAI API key, commented code snippets are provided in the code to enable you to work with a local model using Ollama. The only exception to this is in Chapter 6, where we will use an OpenAI model as an LLM as a judge, and measure token and cost usage of using large and small embedding models.
The code bundle for the book is hosted on GitHub at https://github.com/PacktPublishing/Building-Natural-Language-and-LLM-Pipelines. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing. Check them out!
We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here:https://packt.link/gbp/9781835467992
There are a number of text conventions used throughout this book.
CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. For example: "Through the .draw() method, we can easily create a Mermaid graph of our data flow."
A block of code is set as follows:
hybrid_rag_pipeline.run({
"text_embedder": {"text": question},
"bm25_retriever": {"query": question},
"ranker": {"query": question},
"prompt_builder": {"question": question}
})
Any command-line input or output is written as follows:
$ cd ch4/
$ uv sync
$ source .venv/bin/activate
Bold: Indicates a new term, an important word, or words that you see on the screen. For instance, words in menus or dialog boxes appear in the text like this. For example: "The resolution to this crisis does not lie in prompt engineering, the process of refining written instructions to guide an LLM to produce a desired output."
Warnings or important notes appear like this.
Tips and tricks appear like this.
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book or have any general feedback, please email us at customercare@packt.com and mention the book's title in the subject of your message.
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you reported this to us. Please visit http://www.packt.com/submit-errata, click Submit Errata, and fill in the form.
Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit http://authors.packt.com/.
Change the font size
Change margin width
Change background colour