-
Book Overview & Buying
-
Table Of Contents
Modern Computer Architecture and Organization - Third Edition
By :
As a first step toward understanding the design of a computing system for LLM processing, we must identify the key drivers of LLM computing requirements.
When performing inference with GPT-2, most of the floating-point operations, ranging from 80 to 95% of the total, occur in matrix multiplications. These operations are concentrated at the following points:
,
, and
matrices
matrix by the linear projection matrix
at the attention block outputWe will work through an example to estimate the number of multiplication operations performed by GPT-2. If we assume the maximum context length, the input to each transformer block is a 1,024 x 768 matrix. This matrix is multiplied by the 768 x 768 matrix
to produce the 1,024 x 768
matrix. The number of elements in
is 786,432 (1,024 multiplied by 768). Computing each element of
involves multiplying each of the 768 elements...