-
Book Overview & Buying
-
Table Of Contents
Building Natural Language and LLM Pipelines
By :
In the preceding sections, we architected pipelines that are structurally sound and functionally rich. However, structure alone does not guarantee performance. As we transition from local prototypes to production environments, the latency of our systems becomes a critical constraint. Real-world RAG applications are rarely limited by CPU speed; they are bound by input/output (I/O) operations. They spend the vast majority of their execution time waiting, such as for an embedding API to return a vector, a database to execute a search, or an LLM to generate tokens.
In a standard synchronous pipeline, these wait times are cumulative. If a hybrid search requires a dense retrieval (0.5 s) and a sparse retrieval (0.5 s), the user waits 1.0 s. To solve this, Haystack introduces parallelization and asynchronous pipelines. This advanced capability allows us to decouple independent operations, executing them concurrently to drastically reduce total...