Book Image

Natural Language Processing with TensorFlow

By : Motaz Saad, Thushan Ganegedara
Book Image

Natural Language Processing with TensorFlow

By: Motaz Saad, Thushan Ganegedara

Overview of this book

Natural language processing (NLP) supplies the majority of data available to deep learning applications, while TensorFlow is the most important deep learning framework currently available. Natural Language Processing with TensorFlow brings TensorFlow and NLP together to give you invaluable tools to work with the immense volume of unstructured data in today’s data streams, and apply these tools to specific NLP tasks. Thushan Ganegedara starts by giving you a grounding in NLP and TensorFlow basics. You'll then learn how to use Word2vec, including advanced extensions, to create word embeddings that turn sequences of words into vectors accessible to deep learning algorithms. Chapters on classical deep learning algorithms, like convolutional neural networks (CNN) and recurrent neural networks (RNN), demonstrate important NLP tasks as sentence classification and language generation. You will learn how to apply high-performance RNN models, like long short-term memory (LSTM) cells, to NLP tasks. You will also explore neural machine translation and implement a neural machine translator. After reading this book, you will gain an understanding of NLP and you'll have the skills to apply TensorFlow in deep learning NLP applications, and how to perform specific NLP tasks.
Table of Contents (16 chapters)
Natural Language Processing with TensorFlow
Contributors
Preface
Index

Introduction to the technical tools


In this section, you will be introduced to the technical tools that will be used in the exercises of the following chapters. First, we will present a brief introduction to the main tools provided. Next, we will present a coarse guide on how to install each tool along with hyperlinks to detailed guides provided by the official websites. Additionally, we will share tips on how to make sure that the tools were installed properly.

Description of the tools

We will use Python as the coding/scripting language. Python is a very versatile easy-to-set-up coding language that is heavily used by the scientific community. Additionally, there are numerous scientific libraries floating around Python, catering to areas ranging from deep learning to probabilistic inference to data visualization. TensorFlow is one such library that is well-known among the deep learning community, providing many basic and advanced operations that are useful for deep learning. Next, we will use Jupyter notebooks in all our exercises as it provides a more interactive environment for coding compared to using an IDE. We will also use scikit-learn—another popular machine learning toolkit for Python—for various miscellaneous purposes such as data preprocessing. Another library we will be using for various text related operations is NLTK—Python natural language toolkit. Finally, we will use matplotlib for data visualization.

Installing Python and scikit-learn

Python is hassle-free to install in any of the commonly used operating systems such as Windows, macOS, or Linux. We will use Anaconda to set up Python, as it does all the laborious work for setting up Python as well as the essential libraries.

To install Anaconda, follow these steps:

  1. Download Anaconda from https://www.continuum.io/downloads

  2. Select the appropriate OS and download Python 3.5

  3. Install Anaconda by following the instructions at https://docs.continuum.io/anaconda/install/

To check whether Anaconda was properly installed, follow these steps:

  1. Open a Terminal window (Command Prompt in Windows)

  2. Now, run the following command:

    conda --version
    

If installed properly, the version of the current Anaconda distribution should be shown in Terminalthe instructions at http://scikit-learn.org/stable/install.html, NLTK from https://www.nltk.org/install.html and Matplotlib from https://matplotlib.org/users/installing.html.

Installing Jupyter Notebook

You can install Jupyter Notebook by following the instruction at http://jupyter.readthedocs.io/en/latest/install.html.

To check whether Jupyter Notebook is properly installed, follow these steps:

  1. Open a Terminal window

  2. Run this command:

    jupyter notebook
    

    You should be presented with a new browser window that looks like Figure 1.6:

    Figure 1.6. Jupyter Notebook installed successfully

Installing TensorFlow

Follow the instructions given at https://www.tensorflow.org/install/ under the Installing with Anaconda subsection to install TensorFlow. We will use TensorFlow 1.8.x throughout all the exercises.

When providing the tfBinaryURL as asked in the instruction, make sure that you provide a TensorFlow 1.8.x version. We stress this as the API has undergone many changes compared to the previous TensorFlow versions.

To check whether TensorFlow installed properly, follow these steps:

  1. Open Command Prompt in Windows or Terminal in Linux or macOS.

  2. Type python to enter the Python environment. You should now see the Python version right below. Make sure that you are using Python 3.

  3. Next, enter the following commands:

    import tensorflow as tf
    print(tf.__version__)

If all went well, you should not have any errors (there might be warnings if your computer does not have a dedicated GPU, but you can ignore them) and the TensorFlow version 1.8.x should be shown.

Note

Many cloud-based computational platforms are also available, where you can set up your own machine with various customization (operating system, GPU card type, number of GPU cards, and so on). Many are migrating to such cloud-based services due to the following benefits:

  • More customization options

  • Less maintenance effort

  • No infrastructure requirements

Several popular cloud-based computational platforms are as follows:

  • Google Cloud Platform (GCP): https://cloud.google.com/

  • Amazon Web Services (AWS): https://aws.amazon.com/

  • TensorFlow Research Cloud (TFRC): https://www.tensorflow.org/tfrc/