Book Image

Artificial Vision and Language Processing for Robotics

By : Álvaro Morena Alberola, Gonzalo Molina Gallego, Unai Garay Maestre
Book Image

Artificial Vision and Language Processing for Robotics

By: Álvaro Morena Alberola, Gonzalo Molina Gallego, Unai Garay Maestre

Overview of this book

Artificial Vision and Language Processing for Robotics begins by discussing the theory behind robots. You'll compare different methods used to work with robots and explore computer vision, its algorithms, and limits. You'll then learn how to control the robot with natural language processing commands. You'll study Word2Vec and GloVe embedding techniques, non-numeric data, recurrent neural network (RNNs), and their advanced models. You'll create a simple Word2Vec model with Keras, as well as build a convolutional neural network (CNN) and improve it with data augmentation and transfer learning. You'll study the ROS and build a conversational agent to manage your robot. You'll also integrate your agent with the ROS and convert an image to text and text to speech. You'll learn to build an object recognition system using a video. By the end of this book, you'll have the skills you need to build a functional application that can integrate with a ROS to extract useful information about your environment.
Table of Contents (12 chapters)
Artificial Vision and Language Processing for Robotics
Preface

Summary


AI and deep learning are making huge advances in terms of images and artificial vision thanks to convolutional networks. But RNNs also have a lot of power.

In this chapter, we reviewed how a neural network would can to predict the values of a sine function using temporal sequences. If you change the training data, this architecture can learn about stock movements for each distribution. Also, there are many architectures for RNNs, each of which is optimized for a certain task. But RNNs have a problem with vanishing gradients. A solution to this problem is a new model, called LSTM, which changes the structure of a neuron to memorize timesteps.

Focusing on linguistics, statistical LMs have many problems related with computational load and distribution probabilities. To solve the sparsity problem, the size of the n-gram model was lowered to 4 or 3 grams, but that was an insufficient number of steps back to predict an upcoming word. If we use this approach, the sparsity problem appears...