Book Image

TensorFlow 2.0 Computer Vision Cookbook

By : Jesús Martínez
Book Image

TensorFlow 2.0 Computer Vision Cookbook

By: Jesús Martínez

Overview of this book

Computer vision is a scientific field that enables machines to identify and process digital images and videos. This book focuses on independent recipes to help you perform various computer vision tasks using TensorFlow. The book begins by taking you through the basics of deep learning for computer vision, along with covering TensorFlow 2.x’s key features, such as the Keras and tf.data.Dataset APIs. You’ll then learn about the ins and outs of common computer vision tasks, such as image classification, transfer learning, image enhancing and styling, and object detection. The book also covers autoencoders in domains such as inverse image search indexes and image denoising, while offering insights into various architectures used in the recipes, such as convolutional neural networks (CNNs), region-based CNNs (R-CNNs), VGGNet, and You Only Look Once (YOLO). Moving on, you’ll discover tips and tricks to solve any problems faced while building various computer vision applications. Finally, you’ll delve into more advanced topics such as Generative Adversarial Networks (GANs), video processing, and AutoML, concluding with a section focused on techniques to help you boost the performance of your networks. By the end of this TensorFlow book, you’ll be able to confidently tackle a wide range of computer vision problems using TensorFlow 2.x.
Table of Contents (14 chapters)

Implementing a reusable image caption feature extractor

The first step of creating an image captioning, deep learning-based solution is to transform the data into a format that can be used by certain networks. This means we must encode images as vectors, or tensors, and the text as embeddings, which are vectorial representations of sentences.

In this recipe, we will implement a customizable and reusable component that will allow us to preprocess the data we'll need to implement an image captioner beforehand, thus saving us tons of time later on in the process.

Let's begin!

Getting ready

The dependencies we need are tqdm (to display a nice progress bar) and Pillow (to load and manipulate images using TensorFlow's built-in functions):

$> pip install Pillow tqdm

We will use the Flickr8k dataset, which is available on Kaggle: https://www.kaggle.com/adityajn105/flickr8k. Log in or sign up, download it, and decompress it in a directory of your choosing...