Book Image

Natural Language Processing Fundamentals

By : Sohom Ghosh, Dwight Gunning
Book Image

Natural Language Processing Fundamentals

By: Sohom Ghosh, Dwight Gunning

Overview of this book

If NLP hasn't been your forte, Natural Language Processing Fundamentals will make sure you set off to a steady start. This comprehensive guide will show you how to effectively use Python libraries and NLP concepts to solve various problems. You'll be introduced to natural language processing and its applications through examples and exercises. This will be followed by an introduction to the initial stages of solving a problem, which includes problem definition, getting text data, and preparing it for modeling. With exposure to concepts like advanced natural language processing algorithms and visualization techniques, you'll learn how to create applications that can extract information from unstructured data and present it as impactful visuals. Although you will continue to learn NLP-based techniques, the focus will gradually shift to developing useful applications. In these sections, you'll understand how to apply NLP techniques to answer questions as can be used in chatbots. By the end of this book, you'll be able to accomplish a varied range of assignments ranging from identifying the most suitable type of NLP task for solving a problem to using a tool like spacy or gensim for performing sentiment analysis. The book will easily equip you with the knowledge you need to build applications that interpret human language.
Table of Contents (10 chapters)

About the Book

If Natural Language Processing (NLP) isn't really your forte, Natural Language Processing Fundamentals will make sure you get off to a steady start in the realm of NLP. This comprehensive guide will show you how to effectively use Python libraries and NLP concepts to solve various problems.

You'll be introduced to NLP and its applications through examples and exercises. This will be followed by an introduction to the initial stages of solving a problem, which includes problem definition, getting text data, and preparing text data for modeling. With exposure to concepts such as advanced NLP algorithms and visualization techniques, you'll learn how to create applications that can extract information from unstructured data and present it as impactful visuals. Although you will continue to learn NLP-based techniques, the focus will gradually shift to developing useful applications. In those sections, you'll gain an understanding of how to apply NLP techniques to answer questions, as can be used for chatbots.

By the end of this book, you'll be able to accomplish a varied range of assignments, ranging from identifying the most suitable type of NLP task for solving a problem, to using a tool such as spaCy or Gensim to perform sentiment analysis. The book will equip you with the knowledge you need to build applications that interpret human language.

About the Authors

Sohom Ghosh is a passionate data detective with expertise in Natural Language Processing. He has publications in several international conferences and journals.

Dwight Gunning is a data scientist at FINRA, a financial services regulator in the US. He has extensive experience in Python-based machine learning and hands-on experience with the most popular NLP tools, such as NLTK, Gensim, and spaCy.

Learning Objectives

By the end of this book, you will be able to:

  • Obtain, verify, and clean data before transforming it into a correct format for use
  • Perform data analysis and machine learning tasks using Python
  • Gain an understanding of the basics of computational linguistics
  • Build models for general NLP tasks
  • Evaluate the performance of a model with the right metrics
  • Visualize, quantify, and perform exploratory analysis from any text data

Audience

Natural Language Processing Fundamentals is designed for novice and mid-level data scientists and machine learning developers who want to gather and analyze text data to build an NLP-powered product. It'll help you to have prior experience of coding in Python using data types, writing functions, and importing libraries. Some experience with linguistics and probability is useful but not necessary.

Approach

This book starts with the very basics of reading text into Python code and progresses through the required pipeline of cleaning, stemming, and tokenizing text into a form suitable for NLP. The book then proceeds on to the fundamentals of NLP statistical methods, vector representation, and building models – using the most commonly used NLP libraries. Finally, the book gives students actual practice in using NLP models and code in applications.

Hardware Requirements

For the optimal student experience, we recommend the following hardware configuration:

  • Any entry-level PC/Mac with Windows, Linux, or macOS is sufficient
  • Processor: Dual core or equivalent
  • Memory: 4 GB RAM
  • Storage: 10 GB available space

Software Requirements

You'll also need the following software installed in advance:

  • Operating system: Windows 7 SP1 32/64-bit, Windows 8.1 32/64-bit or Windows 10 32/64-bit, Ubuntu 14.04 or later, or macOS Sierra or later
  • Browser: Google Chrome or Mozilla Firefox
  • Anaconda
  • Jupyter Notebook
  • Python 3.x

Conventions

Code words in the text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "Find out the index value of the word fox using the following code."

A block of code is set as follows:

words = sentence.split()
first_word = words[0]
last_word = words[len(words)-1]
concat_word = first_word + last_word
print(concat_word)

New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: "Stemming leads to inappropriate results such as "battling" getting transformed into battl, which has no meaning."

Installation and Setup

Before you start this book, we'll install Python 3.6, pip, scikit-learn, and the other libraries used in this book. You will find the steps to install these here:

Installing Python

Install Python 3.6 by following the instructions in this link: https://realpython.com/installing-python/.

Installing pip

  1. To install pip, go to this link and download the get-pip.py file: https://pip.pypa.io/en/stable/installing/.
  2. Then, use the following command to install it:
    python get-pip.py

    You might need to use the python3 get-pip.py command, due to previous versions of Python on your computer that already use the python command.

Installing libraries

Using the pip command, install the following libraries:

python -m pip install --user numpy scipy matplotlib pandas scikit-learn nltk

Working with the Jupyter Notebook

You'll be working on different exercises and activities in a Jupyter notebook. These exercises and activities can be downloaded from the associated GitHub repository:

  1. Download the repository from here: https://github.com/TrainingByPackt/Natural-Language-Processing-Fundamentals.

    You can either download it using GitHub or as a zipped folder by clicking on the green Clone or download button on the upper-right side.

  2. In order to open Jupyter notebooks, you have to traverse into the directory with your terminal. To do that, type:
    cd Natural-Language-Processing-Fundamentals/<your current lesson>. 

    For example:

    cd Natural-Language-Processing-Fundamentals/Lesson_01/ 
  3. To reach each activity and exercise, you have to use cd once more to go into each folder, like so:
    cd Activity01
  4. Once you are in the folder of your choice, simply call jupyter notebook.

Importing Python Libraries

Every exercise and activity in this book will make use of various libraries. Importing libraries into Python is very simple and here's how we do it:

  1. To import libraries such as NumPy and pandas, we have to run the following code. This will import the whole numpy library into our current file.
    import numpy# import numpy 
  2. In the first cells of the exercises and activities of this book ware, you will see the following code. We can use np instead of numpy in our code to call methods from numpy:
    import numpy as np# import numpy and assign alias np 
  3. In later chapters, partial imports will be present, as shown in the following code. This only loads the mean method from the library:
    from numpy import mean# only import the mean method of numpy 

Installing the Code Bundle

Copy the code bundle for the class to the C:/Code folder.

Additional Resources

The code bundle for this book is also hosted on GitHub at https://github.com/TrainingByPackt/Natural-Language-Processing-Fundamentals.

The high-quality color images used in book can be found at: https://github.com/TrainingByPackt/Natural-Language-Processing-Fundamentals/tree/master/Graphics.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!