Python Natural Language Processing

Overview of this book

This book starts off by laying the foundation for Natural Language Processing and why Python is one of the best options to build an NLP-based expert system with advantages such as Community support, availability of frameworks and so on. Later it gives you a better understanding of available free forms of corpus and different types of dataset. After this, you will know how to choose a dataset for natural language processing applications and find the right NLP techniques to process sentences in datasets and understand their structure. You will also learn how to tokenize different parts of sentences and ways to analyze them. During the course of the book, you will explore the semantic as well as syntactic analysis of text. You will understand how to solve various ambiguities in processing human language and will come across various scenarios while performing text analysis. You will learn the very basics of getting the environment ready for natural language processing, move on to the initial setup, and then quickly understand sentences and language parts. You will learn the power of Machine Learning and Deep Learning to extract information from text data. By the end of the book, you will have a clear understanding of natural language processing and will have worked on multiple examples that implement NLP in the real world.

Preface

What this book covers

What you need for this book

Free Chapter

Introduction

Understanding natural language processing

Understanding basic applications

Advantages of togetherness - NLP and Python

Environment setup for NLTK

Tips for readers

Summary

Practical Understanding of a Corpus and Dataset

What is a corpus?

Why do we need a corpus?

Understanding corpus analysis

Understanding types of data attributes

Exploring different file formats for corpora

Resources for accessing free corpora

Preparing a dataset for NLP applications

Web scraping

Summary

Understanding the Structure of a Sentences

Understanding components of NLP

Natural language understanding

Defining context-free grammar

Morphological analysis

Syntactic analysis

Semantic analysis

Handling ambiguity

Discourse integration

Pragmatic analysis

Summary

Preprocessing

Handling corpus-raw text

Handling corpus-raw sentences

Basic preprocessing

Practical and customized preprocessing

Summary

Feature Engineering and NLP Algorithms

Understanding feature engineering

Basic feature of NLP

Basic statistical features for NLP

Advantages of features engineering

Challenges of features engineering

Summary

Advanced Feature Engineering and NLP Algorithms

Recall word embedding

Understanding the basics of word2vec

Converting the word2vec model from black box to white box

Understanding the components of the word2vec model

Understanding the logic of the word2vec model

Understanding algorithmic techniques and the mathematics behind the word2vec model

Algorithms used by neural networks

Some of the facts related to word2vec

Applications of word2vec

Implementation of simple examples

Advantages of word2vec

Challenges of word2vec

How is word2vec used in real-life applications?

When should you use word2vec?

Developing something interesting

Extension of the word2vec concept

Importance of vectorization in deep learning

Summary

Rule-Based System for NLP

Understanding of the rule-based system

Purpose of having the rule-based system

Architecture of the RB system

Understanding the RB system development life cycle

Applications

Developing NLP applications using the RB system

Comparing the rule-based approach with other approaches

Advantages of the rule-based system

Disadvantages of the rule-based system

Challenges for the rule-based system

Understanding word-sense disambiguation basics

Discussing recent trends for the rule-based system

Summary

Machine Learning for NLP Problems

Understanding the basics of machine learning

Development steps for NLP applications

Understanding ML algorithms and other concepts

Hybrid approaches for NLP applications

Summary

Deep Learning for NLU and NLG Problems

An overview of artificial intelligence

Comparing NLU and NLG

A brief overview of deep learning

Basics of neural networks

Implementation of ANN

Deep learning and deep neural networks

Deep learning techniques and NLU

Deep learning techniques and NLG

Gradient descent-based optimization

Artificial intelligence versus human intelligence

Summary

Advanced Tools

Apache Hadoop as a storage framework

Apache Spark as a processing framework

Apache Flink as a real-time processing framework

Visualization libraries in Python

Summary

How to Improve Your NLP Skills

Beginning a new career journey with NLP

Cheat sheets

Choose your area

Agile way of working to achieve success

Useful blogs for NLP and data science

Grab public datasets

Mathematics needed for data science

Summary

Installation Guide

Installing Python, pip, and NLTK

Installing the PyCharm IDE

Installing dependencies

Framework installation guides

Drop your queries

Summary

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Handling corpus-raw sentences

In the previous section, we were processing on raw text and looked at concepts at the sentence level. In this section, we are going to look at the concepts of tokenization, lemmatization, and so on at the word level.

Word tokenization

Word tokenization is defined as the process of chopping a stream of text up into words, phrases, and meaningful strings. This process is called word tokenization. The output of the process are words that we will get as an output after tokenization. This is called a token.

Let's see the code snippet given in Figure 4.11 of tokenized words:

Figure 4.11: Word tokenized code snippet

The output of the code given in Figure 4.11 is as follows:

The input for word tokenization...

Python Natural Language Processing

Python Natural Language Processing

Overview of this book

Related Content you might be interested in

Current Title:

Python Natural Language Processing

Machine Learning Solutions

Hands-On Python Natural Language Processing

Natural Language Understanding with Python