Book Image

Artificial Vision and Language Processing for Robotics

By : Álvaro Morena Alberola, Gonzalo Molina Gallego, Unai Garay Maestre
Book Image

Artificial Vision and Language Processing for Robotics

By: Álvaro Morena Alberola, Gonzalo Molina Gallego, Unai Garay Maestre

Overview of this book

Artificial Vision and Language Processing for Robotics begins by discussing the theory behind robots. You'll compare different methods used to work with robots and explore computer vision, its algorithms, and limits. You'll then learn how to control the robot with natural language processing commands. You'll study Word2Vec and GloVe embedding techniques, non-numeric data, recurrent neural network (RNNs), and their advanced models. You'll create a simple Word2Vec model with Keras, as well as build a convolutional neural network (CNN) and improve it with data augmentation and transfer learning. You'll study the ROS and build a conversational agent to manage your robot. You'll also integrate your agent with the ROS and convert an image to text and text to speech. You'll learn to build an object recognition system using a video. By the end of this book, you'll have the skills you need to build a functional application that can integrate with a ROS to extract useful information about your environment.
Table of Contents (12 chapters)
Artificial Vision and Language Processing for Robotics
Preface

Preface

Note

About

This section briefly introduces the author, the coverage of this book, the technical skills you'll need to get started, and the hardware and software requirements required to complete all of the included activities and exercises.

About the Book

Artificial Vision and Language Processing for Robotics begins by discussing the theory behind robots. You'll compare different methods used to work with robots and explore computer vision, its algorithms, and limits. You'll then learn how to control the robot with natural language processing commands. As you make your way through this book, you'll study Word2Vec and GloVe embedding techniques, non-numeric data, as well as recurrent neural networks (RNNs) and their advanced models. You'll create a simple Word2Vec model with Keras, build a convolutional neural network (CNN), and improve it with data augmentation and transfer learning. You'll walk through ROS and build a conversational agent to manage your robot. You'll also integrate your agent with ROS and convert an image to text and text to speech. You'll learn how to build an object recognition system with the help of a video clip.

By the end of this book, you'll have the skills you need to build a functional application that can integrate with ROS to extract useful information from your environment.

About the Author

Álvaro Morena Alberola is a computer engineer and loves robotics and artificial intelligence. Currently, he is working as a software developer. He is extremely interested in the core part of AI, which is based on artificial vision. Álvaro likes working with new technologies and learning how to use advanced tools. He perceives robotics as a way of easing human lives; a way of helping people perform tasks that they cannot do on their own.

Gonzalo Molina Gallego is a computer science graduate and specializes in artificial intelligence and natural language processing. He has experience of working on text-based dialog systems, creating conversational agents, and advising good methodologies. Currently, he is researching new techniques on hybrid-domain conversational systems. Gonzalo thinks that conversational user interfaces are the future.

Unai Garay Maestre is a computer science graduate and specializes in the field of artificial intelligence and computer vision. He successfully contributed to the CIARP conference of 2018 with a paper that takes a new approach to data augmentation using variational autoencoders. He also works as a machine learning developer using deep neural networks applied to images.

Objectives

  • Explore ROS and build a basic robotic system

  • Identify conversation intents with NLP techniques

  • Learn and use word embedding with Word2Vec and GloVe

  • Use deep learning to implement artificial intelligence (AI) and object recognition

  • Develop a simple object recognition system using CNNs

  • Integrate AI with ROS to enable your robot to recognize objects

Audience

Artificial Vision and Language Processing for Robotics is for robotics engineers who want to learn how to integrate computer vision and deep learning techniques to create complete robotic systems. It will be beneficial if you have a working knowledge of Python and a background in deep learning. Knowledge of ROS is a plus.

Approach

Artificial Vision and Language Processing for Robotics takes a practical approach to equip you with tools for creating systems that integrate computer vision and NLP to control a robot. The book is divided into three parts: NLP, computer vision, and robotics. It introduces advanced topics after a detailed introduction to the basics. It also contains multiple activities for you to practice and apply your new skills in a highly relevant context.

Minimum Hardware Requirements

For the optimal student experience, we recommend the following hardware configuration:

  • Processor: 2GHz dual core processor or better

  • Memory: 8 GB RAM

  • Storage: 5 GB available hard disk space

  • A good internet connection

To train neural networks, we recommend using Google Colab. But if you want to train these networks with your computer, you will need:

  • NVIDIA GPU

Software Requirements

We don't recommend using Ubuntu 16.04 for this book because of compatibility issues with ROS Kinetic. But if you want to use Ubuntu 18.04, there is a version that is ROS supported, named Melodic. During the project, you will need to install several libraries to complete all of the exercises, such as NLTK (<= 3.4), spaCy (<=2.0.18), gensim (<=3.7.0), NumPy (<=1.15.4), sklearn (<=0.20.1), Matplotlib (<=3.0.2), OpenCV (<=4.0.0.21), Keras (<=2.2.4), and Tensorflow (<=1.5, >=2.0). The installation process for each library is explained in the exercises.

To use YOLO in your Ubuntu system, you will need to install the NVIDIA drivers of your GPU and the NVIDIA CUDA toolkit.

Conventions

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "With the TfidfVectorizer method, we can convert the collection of documents in our corpus to a matrix of TF-IDF features"

A block of code is set as follows:

vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(corpus)

New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: "Morphological analysis: Focused on the words of a sentence and analyzing its morphemes"

Installation and Setup

Before you start this book, you need to install the following software. You will find the steps to install these here:

Installing Git LFS

In order to download all the resources from the GitHub of this book and be able to use images to train your neural network model, you will need to install Git LFS (Git Large File Storage). It replaces large files such as audio samples, videos, datasets, and graphics with text pointers inside Git.

If you have not cloned the repository:

  1. Install Git LFS

  2. Clone the Git repository

  3. From the repository folder, execute gitlfs pull

  4. Done

If the repository is already cloned:

  1. Install Git LFS

  2. From the repository folder, execute: gitlfs pull

  3. Done

Installing Git LFS: https://github.com/git-lfs/git-lfs/wiki/Installation

[Recommended] Google Colaboratory

If you have the option, use Google Colaboratory. It is a free Jupyter notebook environment that requires no setup and runs entirely in the cloud. You can also take advantage of running it on a GPU.

The steps for using it are as follows:

  1. Upload the chapter or the entire GitHub to your Google Drive account, so you can use the files that are stored in the repository. Make sure you have made use of Git LFS first to load all the files.

  2. Go to the folder where you want to open a new Google Colab Notebook, click New > More > Colaboratory. Now, you have a Google Colab Notebook opened and saved in the corresponding folder, and you are ready to use Python, Keras, or any other library that is already installed.

  3. If you want to install a specific library, you can do so using the “pip” package installation or any other command-line installation and adding “!” at the beginning. For instance, “!pip install sklearn”, which would install scikit-learn.

  4. If you want to be able to load files from your Google Drive, you need to execute these two lines of code in a Google Colab cell:

    from google.colab import drive
    drive.mount(‘drive')
  5. Then, open the link that appears in the output and log in with the Google account that you used to create the Google Colab Notebook.

  6. You can now navigate to where the files were uploaded using ls to list the files in the current directory and cd to navigate to a specific folder:

  7. Now, the Google Colab Notebook is capable of loading any file and performing any task, just like a Jupyter notebook opened in that folder would do.

Installing ROS Kinetic

These are the steps you must follow to install the framework in your Ubuntu system:

  1. Prepare Ubuntu for accepting the ROS software:

    sudosh -c ‘echo “deb http://packages.ros.org/ros/ubuntu $(lsb_release -sc) main” > /etc/apt/sources.list.d/ros-latest.list'
  2. Configure the download keys:

    sudo apt-key adv --keyserver hkp://ha.pool.sks-keyservers.net:80 --recv-key 421C365BD9FF1F717815A3895523BAEEB01FA116
  3. Ensure that the system is updated:

    sudo apt-get update
  4. Install the full framework to not miss functionalities:

    sudo apt-get install ros-kinetic-desktop-full
  5. Initialize and update rosdep:

    sudo rosdep init
    rosdep update
  6. Add environment variables to the bashrc file if you want to avoid declaring them each time you work with ROS:

    echo “source /opt/ros/kinetic/setup.bash” >> ~/.bashrcsource ~/.bashrc

    Note

    It might be appropriate to reboot your computer after this process for the system to implement the new configuration.

  7. Check that the framework is correctly working by starting it:

    roscore

Configuring TurtleBot

Note

It may happen that TurtleBot is not compatible with your ROS distribution (we are using Kinetic Kame), but don't worry, there are lots of robots that you can simulate in Gazebo. You can look up different robots and try to use them with your ROS distribution.

This is the configuration process for TurtleBot:

  1. Install its dependencies:

    sudo apt-get install ros-kinetic-turtlebotros-kinetic-turtlebot-apps ros-kinetic-turtlebot-interactions ros-kinetic-turtlebot-simulator ros-kinetic-kobuki-ftdiros-kinetic-ar-track-alvar-msgs
  2. Download the TurtleBot simulator package in your catkin workspace:

    cd ~/catkin_ws/src
    git clone https://github.com/turtlebot/turtlebot_simulator
  3. After that, you should be able to use TurtleBot with Gazebo.

    If you get an error trying to visualize TurtleBot in Gazebo, download the turtlebot_simulator folder from our GitHub and replace it.

    Start ROS services:

    roscore

    Launch TurtleBot World:

    cd ~/catkin_ws
    catkin_make
    sourcedevel/setup.bash
    roslaunchturtlebot_gazeboturtlebot_world.launch

Basic Installation of Darknet

Follow these steps for installing Darknet:

  1. Download the framework:

    git clone https://github.com/pjreddie/darknet
  2. Switch to the downloaded folder and run the compilation command:

    cd darknet
    make

    You should see an output like the following if the compilation process was correctly completed:

    The Darknet compilation output

Advanced Installation of Darknet

This is the installation process that you must complete in order to achieve the chapter objectives. It will allow you to use GPU computation to detect and recognize objects in real time. Before performing this installation, you must have some dependencies installed on your Ubuntu system, such as:

  • NVIDIA drivers: Drivers that will allow your system to correctly work with your GPU. As you may know, it must be an NVIDIA model.

  • CUDA: This is an NVIDIA toolkit that provides a development environment for building applications that need GPU usage.

  • OpenCV: This is a free artificial vision library, which is very useful for working with images.

    Note

    It is important to consider that all these dependencies are available in several versions. You must find the version of each tool that is compatible with your specific GPU and system.

    Once your system is ready, you can perform the advanced installation:

  1. Download the framework if you didn't do the basic installation:

    git clone https://github.com/pjreddie/darknet
  2. Modify the Makefile first lines to enable OpenCV and CUDA. It should be as follows:

    GPU=1
    CUDNN=0
    OPENCV=1
    OPENMP=0
    DEBUG=0
  3. Save Makefile changes, switch to darknet directory and run the compilation command:

    cd darknet
    make

    Now, you should see an output similar to this one:

    The Darknet compilation with CUDA and OpenCV

Installing YOLO

Before performing this installation, you must have some dependencies installed on your Ubuntu system, as mentioned in the advanced installation of Darknet.

Note

It is important to take into account that all these dependencies are available in several versions. You must find the version of each tool that is compatible with your specific GPU and system.

Once your system is ready, you can perform the advanced installation:

  1. Download the framework:

    git clone https://github.com/pjreddie/darknet
  2. Modify the Makefile first lines to enable OpenCV and CUDA. It should be as follows:

    GPU=1
    CUDNN=0
    OPENCV=1
    OPENMP=0
    DEBUG=0
  3. Save Makefile changes, switch to the darknet directory, and run the compilation command:

    cd darknet
    Make

Additional Resources

The code bundle for this book is also hosted on GitHub at: https://github.com/PacktPublishing/Artificial-Vision-and-Language-Processing-for-Robotics.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Links to documentation:

ROS Kinetic - http://wiki.ros.org/kinetic/Installation

Git Large File Storage - https://git-lfs.github.com/