-
Book Overview & Buying
-
Table Of Contents
Modern Computer Architecture and Organization - Third Edition
By :
This section introduces the high-level structure of the GPT-2 model and breaks its internal operations into a sequence of standard mathematical and tensor operations. We will not attempt to explain the theoretical background that underlies the model's specific architecture. Instead, our focus is on the processing operations involved in its execution and the requirements those operations place on the computational hardware that runs the model.
Like many LLMs, GPT-2 is available in several sizes. The size of an LLM is generally defined by the number of trainable parameters it contains, where each parameter is an artificial neural network weight or related quantity that is optimized during training. The value of a parameter is simply a scalar, such as 0.1.
We will examine the smallest GPT-2 model, which contains 124 million parameters. The relatively small size of this model makes it easier to keep track of the tensor dimensions and the counts of multiplication and addition...