Book Image

Artificial Vision and Language Processing for Robotics

By : Álvaro Morena Alberola, Gonzalo Molina Gallego, Unai Garay Maestre
Book Image

Artificial Vision and Language Processing for Robotics

By: Álvaro Morena Alberola, Gonzalo Molina Gallego, Unai Garay Maestre

Overview of this book

Artificial Vision and Language Processing for Robotics begins by discussing the theory behind robots. You'll compare different methods used to work with robots and explore computer vision, its algorithms, and limits. You'll then learn how to control the robot with natural language processing commands. You'll study Word2Vec and GloVe embedding techniques, non-numeric data, recurrent neural network (RNNs), and their advanced models. You'll create a simple Word2Vec model with Keras, as well as build a convolutional neural network (CNN) and improve it with data augmentation and transfer learning. You'll study the ROS and build a conversational agent to manage your robot. You'll also integrate your agent with the ROS and convert an image to text and text to speech. You'll learn to build an object recognition system using a video. By the end of this book, you'll have the skills you need to build a functional application that can integrate with a ROS to extract useful information about your environment.
Table of Contents (12 chapters)
Artificial Vision and Language Processing for Robotics
Preface

Fundamentals of CNNs


In this topic, we will see how CNNs work and explain the process of convolving an image.

We know that images are made up of pixels, and if the image is in RGB, for example, it will have three channels where each letter/color (Red-Green-Blue) has its own channel with a set of pixels of the same size. Fully-connected neural networks do not represent this depth in an image in every layer. Instead, they have a single dimension to represent this depth, which is not enough. Furthermore, they connect every single neuron of one layer to every single neuron of the next layer, and so on. This in turn results in lower performance, meaning you would have to train a network for longer and would still not get good results.

CNNs are a category of neural networks that has ended up being very effective for tasks such as classification and image recognition. Although, they also work very well for sound and text data. CNNs consist of an input, hidden layers, and an output layer, just like...