Book Image

Artificial Vision and Language Processing for Robotics

By : Álvaro Morena Alberola, Gonzalo Molina Gallego, Unai Garay Maestre
Book Image

Artificial Vision and Language Processing for Robotics

By: Álvaro Morena Alberola, Gonzalo Molina Gallego, Unai Garay Maestre

Overview of this book

Artificial Vision and Language Processing for Robotics begins by discussing the theory behind robots. You'll compare different methods used to work with robots and explore computer vision, its algorithms, and limits. You'll then learn how to control the robot with natural language processing commands. You'll study Word2Vec and GloVe embedding techniques, non-numeric data, recurrent neural network (RNNs), and their advanced models. You'll create a simple Word2Vec model with Keras, as well as build a convolutional neural network (CNN) and improve it with data augmentation and transfer learning. You'll study the ROS and build a conversational agent to manage your robot. You'll also integrate your agent with the ROS and convert an image to text and text to speech. You'll learn to build an object recognition system using a video. By the end of this book, you'll have the skills you need to build a functional application that can integrate with a ROS to extract useful information about your environment.
Table of Contents (12 chapters)
Artificial Vision and Language Processing for Robotics
Preface

YOLO


YOLO is a real-time object detection system based on deep learning and is included in the Darknet framework. Its name comes from the acronym You Only Look Once, which references to how fast YOLO can work. On the website (https://pjreddie.com/darknet/yolo/), the author has added an image where this system is compared to others with the same purpose:

Figure 9.1: A comparison of object detection systems

In the preceding graphic, the y axis represents the mAP (mean Average Precision), and the x axis represents the time in milliseconds. So, you can see that YOLO achieves a higher mAP in lesser time than the other systems.

It is also important to understand how YOLO works. It uses a neural network, which is applied to the entire image and splits it into different parts, predicting the bounding boxes. These bounding boxes are similar to rectangles marking off certain objects, which will be identified later in the process. YOLO is fast, because it is able to make predictions with only an evaluation...