Book Image

Natural Language Processing with TensorFlow

By : Motaz Saad, Thushan Ganegedara
Book Image

Natural Language Processing with TensorFlow

By: Motaz Saad, Thushan Ganegedara

Overview of this book

Natural language processing (NLP) supplies the majority of data available to deep learning applications, while TensorFlow is the most important deep learning framework currently available. Natural Language Processing with TensorFlow brings TensorFlow and NLP together to give you invaluable tools to work with the immense volume of unstructured data in today’s data streams, and apply these tools to specific NLP tasks. Thushan Ganegedara starts by giving you a grounding in NLP and TensorFlow basics. You'll then learn how to use Word2vec, including advanced extensions, to create word embeddings that turn sequences of words into vectors accessible to deep learning algorithms. Chapters on classical deep learning algorithms, like convolutional neural networks (CNN) and recurrent neural networks (RNN), demonstrate important NLP tasks as sentence classification and language generation. You will learn how to apply high-performance RNN models, like long short-term memory (LSTM) cells, to NLP tasks. You will also explore neural machine translation and implement a neural machine translator. After reading this book, you will gain an understanding of NLP and you'll have the skills to apply TensorFlow in deep learning NLP applications, and how to perform specific NLP tasks.
Table of Contents (16 chapters)
Natural Language Processing with TensorFlow
Contributors
Preface
Index

What is Natural Language Processing?


According to IBM, 2.5 exabytes (1 exabyte = 1,000,000,000 gigabytes) of data were generated every day in 2017, and this is growing as this book is being written. To put that into perspective, if all the human beings in the world were to process that data, it would be roughly 300 MB for each of us every day to process. Of all this data, a large fraction is unstructured text and speech as there are millions of emails and social media content created and phone calls made every day.

These statistics provide a good basis for us to define what NLP is. Simply put, the goal of NLP is to make machines understand our spoken and written languages. Moreover, NLP is ubiquitous and is already a large part of human life. Virtual Assistants (VAs), such as Google Assistant, Cortana, and Apple Siri, are largely NLP systems. Numerous NLP tasks take place when one asks a VA, "Can you show me a good Italian restaurant nearby?". First, the VA needs to convert the utterance to text (that is, speech-to-text). Next, it must understand the semantics of the request (for example, the user is looking for a good restaurant with an Italian cuisine) and formulate a structured request (for example, cuisine = Italian, rating = 3-5, distance < 10 km). Then, the VA must search for restaurants filtering by the location and cuisine, and then, sort the restaurants by the ratings received. To calculate an overall rating for a restaurant, a good NLP system may look at both the rating and text description provided by each user. Finally, once the user is at the restaurant, the VA might assist the user by translating various menu items from Italian to English. This example shows that NLP has become an integral part of human life.

It should be understood that NLP is an extremely challenging field of research as words and semantics have a highly complex nonlinear relationship, and it is even more difficult to capture this information as a robust numerical representation. To make matters worse, each language has its own grammar, syntax, and vocabulary. Therefore, processing textual data involves various complex tasks such as text parsing (for example, tokenization and stemming), morphological analysis, word sense disambiguation, and understanding the underlying grammatical structure of a language. For example, in these two sentences, I went to the bank and I walked along the river bank, the word bank has two entirely different meanings. To distinguish or (disambiguate) the word bank, we need to understand the context in which the word is being used. Machine learning has become a key enabler for NLP, helping to accomplish the aforementioned tasks through machines.