Mixed precision is a technique for exploring low-precision, and obtains a high accuracy result. This technique computes core operations with low precision and generates output with high-precision operations. Low precision operation computation has the benefits of reduced memory bandwidth and higher computing throughput compared with high-precision computing. If low precision suffices to get target accuracy from an application with high precision, this technique can benefit performance with this trade-off. NVIDIA Developer Blog introduces this programmability: https://devblogs.nvidia.com/mixed-precision-programming-cuda-8.
In these circumstances, CUDA extends its supports to low-precision tools lower than 32-bit data types, such as 8/16-bit integers (INT8/INT16) and 16-bit floating points (FP16). For those low-precision data types, a GPU can use...