Book Image

Artificial Vision and Language Processing for Robotics

By : Álvaro Morena Alberola, Gonzalo Molina Gallego, Unai Garay Maestre
Book Image

Artificial Vision and Language Processing for Robotics

By: Álvaro Morena Alberola, Gonzalo Molina Gallego, Unai Garay Maestre

Overview of this book

Artificial Vision and Language Processing for Robotics begins by discussing the theory behind robots. You'll compare different methods used to work with robots and explore computer vision, its algorithms, and limits. You'll then learn how to control the robot with natural language processing commands. You'll study Word2Vec and GloVe embedding techniques, non-numeric data, recurrent neural network (RNNs), and their advanced models. You'll create a simple Word2Vec model with Keras, as well as build a convolutional neural network (CNN) and improve it with data augmentation and transfer learning. You'll study the ROS and build a conversational agent to manage your robot. You'll also integrate your agent with the ROS and convert an image to text and text to speech. You'll learn to build an object recognition system using a video. By the end of this book, you'll have the skills you need to build a functional application that can integrate with a ROS to extract useful information about your environment.
Table of Contents (12 chapters)
Artificial Vision and Language Processing for Robotics
Preface

Neural Language Models


Chapter 3, Fundamentals of Natural Language Processing introduced us to statistical language models (LMs), which are the probability distribution for a sequence of words. We know LMs can be used to predict the next word in a sentence, or to compute the probability distribution of the next word.

Figure 4.20: LM formula to compute the probability distribution of an upcoming word

The sequence of words is x1 , x2 … and the next word is xt+1. wj is a word in the vocabulary. V is the vocabulary and j is a position of a word in that vocabulary. wj is the word located in position j within V.

You use LMs every day. The keyboards on cell phones use this technology to predict the next word of a sentence, and search engines such as Google use it to predict what you want to search in their search for engine.

We talked about the n-gram model and bigrams counting the words in a corpus, but that solution has some limitations, such as long dependencies. Deep NLP and neural LMs will help...