Book Image

Deep Learning for Computer Vision

By : Rajalingappaa Shanmugamani
Book Image

Deep Learning for Computer Vision

By: Rajalingappaa Shanmugamani

Overview of this book

Deep learning has shown its power in several application areas of Artificial Intelligence, especially in Computer Vision. Computer Vision is the science of understanding and manipulating images, and finds enormous applications in the areas of robotics, automation, and so on. This book will also show you, with practical examples, how to develop Computer Vision applications by leveraging the power of deep learning. In this book, you will learn different techniques related to object classification, object detection, image segmentation, captioning, image generation, face analysis, and more. You will also explore their applications using popular Python libraries such as TensorFlow and Keras. This book will help you master state-of-the-art, deep learning algorithms and their implementation.
Table of Contents (17 chapters)
Title Page
Copyright and Credits
Packt Upsell
Foreword
Contributors
Preface

Approaches for image captioning and related problems


Several approaches have been suggested for captioning images. Intuitively, the images are converted to visual features and text is generated from the features. The text generated will be in the form of word embedding. Some of the predominant approaches for generating text involve LSTM and attention. Let's begin with an approach that uses an old way of generating text.

Using a condition random field for linking image and text

Kulkarni et al., in the paper http://www.tamaraberg.com/papers/generation_cvpr11.pdf, proposed a method of finding the objects and attributes from an image and using it to generate text with a conditional random field (CRF). The CRF is traditionally used for a structured prediction such as text generation. The flow of generating text is shown here:

Figure illustrating the process of text generation using CRF [Reproduced from Kulkarni et al.]

The use of CRF has limitations in generating text in a coherent manner with proper...