The challenge of calculating such a high number of matrix translations per second is typically too much for an average CPU. Unfortunately, it hasn't been designed to handle such a huge number of requests per second in a highly efficient and parallel manner. This is why we require some dedicated hardware that features thousands of individual cores that are able to handle the thousands of millions of requests that are thrown at it.
In Chapter 2, Parallelize It, we touched briefly on the SIMD architecture style that these graphics cards follow. We looked at how it's excellent for doing this style of work, but we never looked at how it could be used for alternative means such as data science and machine learning.
These GPUs are absolutely phenomenal at handling the high-intensity graphics calculations that get thrown at it, but it's important to note that these can be repurposed very easily to other tasks such as statistical analysis, data mining, cryptography, and more. In this...