A very popular domain of computer science is that of image processing. It deals with the manipulation of images and the various methods by which we can extract information from them. Another popular domain, Natural Language Processing (NLP), deals with how we can make machines that can understand and produce meaningful natural languages. Image captioning defines a mixture of the two topics, which attempts to first extract the information of objects appearing in any image and then to generate a caption describing the objects.
The caption should be generated in such a way that it is a meaningful string of words and is expressed in the form of a natural language sentence.
Consider the following image:
The objects that can be detected in the image are as follows: spoon, glass, coffee, and table.
However, do we have answers to the following...