The cuBLAS library is a GPU-optimized, standard implementation of Basic Linear Algebra Subroutines (BLAS). Using its APIs, the programmers can write GPU-optimized, compute-intensive code to a single GPU or multiple GPUs. There are three levels in cuBLAS. Level-1 performs the vector-vector operation, level-2 does the matrix-vector operation, and level-3 does the matrix-matrix operation.
Covering each level is out of the scope of this book. We are just focusing on how to use cuBLAS APIs and extend its performance for multiple GPUs. To be specific, this receipt will cover a Single Precision Floating Matrix Multiplication (SGEMM) operation—a level-3 operation.
The cuBLAS library is a part of CUDA Toolkit, so you can use cuBLAS without extra installation. Also, you can use the cc or cpp file extensions, rather than .cu, because...