-
Book Overview & Buying
-
Table Of Contents
LLM Design Patterns
By :
To train larger models, we need to employ techniques such as gradient accumulation and mixed precision training.
To train very large language models that might not fit on a single GPU, the following code introduces a special LargeScaleLLMTrainer. It uses two main tricks to handle this:
First, gradient accumulation allows us to simulate having access to a larger GPU. Instead of updating the model's parameters after every small batch of data, we process several small batches, accumulating their gradients along the way. Only after a predefined number of batches do we perform an actual update to the model's parameters. This technique enables the model to learn as if it had seen a much larger batch of data, without requiring the memory capacity of an extremely large GPU.
Second, it employs mixed precision training, a technique where the computer performs many calculations using smaller, lower-precision numbers (which require...
Change the font size
Change margin width
Change background colour