Appendix II — Hardware Constraints for Transformer Models
Transformer models could not exist without optimized hardware. Memory and disk management design remain critical components. However, computing power remains a prerequisite. It would be nearly impossible to train the original Transformer described in Chapter 2, Getting Started with the Architecture of the Transformer Model, without GPUs. GPUs are at the center of the battle for efficient transformer models.
This appendix to Chapter 3, Fine-Tuning BERT Models, will take you through the importance of GPUs in three steps:
- The architecture and scale of transformers
- CPUs versus GPUs
- Implementing GPUs in PyTorch as an example of how any other optimized language optimizes