Book Image

Artificial Vision and Language Processing for Robotics

By : Álvaro Morena Alberola, Gonzalo Molina Gallego, Unai Garay Maestre
Book Image

Artificial Vision and Language Processing for Robotics

By: Álvaro Morena Alberola, Gonzalo Molina Gallego, Unai Garay Maestre

Overview of this book

Artificial Vision and Language Processing for Robotics begins by discussing the theory behind robots. You'll compare different methods used to work with robots and explore computer vision, its algorithms, and limits. You'll then learn how to control the robot with natural language processing commands. You'll study Word2Vec and GloVe embedding techniques, non-numeric data, recurrent neural network (RNNs), and their advanced models. You'll create a simple Word2Vec model with Keras, as well as build a convolutional neural network (CNN) and improve it with data augmentation and transfer learning. You'll study the ROS and build a conversational agent to manage your robot. You'll also integrate your agent with the ROS and convert an image to text and text to speech. You'll learn to build an object recognition system using a video. By the end of this book, you'll have the skills you need to build a functional application that can integrate with a ROS to extract useful information about your environment.
Table of Contents (12 chapters)
Artificial Vision and Language Processing for Robotics
Preface

Darknet


Darknet is an open source neural network framework, which has been written in C and CUDA. It is very fast, as it allows GPU as well as CPU computation. It was developed by Joseph Redmon, a computer scientist focused on artificial vision.

Although we are not going to study all of the functionalities in this chapter, Darknet includes a lot of interesting applications. As we mentioned earlier, we are going to use YOLO, but the following is a list of other Darknet functionalities:

  • ImageNet Classification: This is an image classifier, which uses known models such as AlexNet, ResNet, and ResNeXt. After classifying some ImageNet images with all these models, a comparison between them is performed. They are based on time, accuracy, weights etc..

  • RNN's: Recurrent neural networks are used for generating and managing natural language. They use an architecture called a vanilla RNN with three recurrent modules, which achieves good results in tasks such as speech recognition and natural language...