Hardware optimization with FPGAs
In the previous section, we exported a model to ONNX to take advantage of an inference-optimized and hardware-accelerated runtime to improve the scoring performance. In this section, we will take this approach one step further to deploy on even faster inferencing hardware: FPGAs.
But, before we talk about how to deploy a model to an FPGA, let's first understand what an FPGA is and why we would choose one as a target for DL inference instead of a GPU.
Understanding FPGAs
Most people typically come across a specific variety of integrated circuit (IC), called an application-specific integrated circuit (ASIC). ASICs are purpose-built ICs, such as the processor in your laptop, the GPU cores on your graphics card, or the microcontroller in your washing machine. These chips share the fact that they have a fixed hardware footprint optimized to support a specific task. Often, like any general processor, they operate with a specific instruction...