There are two types attention mechanisms. They are as follows:
- Hard attention
- Soft attention
Let's now take a look at each one in detail in the following sections.
In reality, in our recent image caption example, several more pictures would be selected, but due to our training with the handwritten captions, those would never be weighted higher. However, the essential thing to understand is how the system would understand what all pixels (or more precisely, the CNN representations of them) the system focuses on to draw these high-resolution images of different aspects and then how to choose the next pixel to repeat the process.
In the preceding example, the points are chosen at random from a distribution and the process is repeated. Also, which pixels around this point get a higher resolution is decided inside the attention network. This type of attention is known as hard attention.
Hard attention has something called the differentiability problem. Let's spend some...