Book Image

Python Natural Language Processing Cookbook

By : Zhenya Antić
Book Image

Python Natural Language Processing Cookbook

By: Zhenya Antić

Overview of this book

Python is the most widely used language for natural language processing (NLP) thanks to its extensive tools and libraries for analyzing text and extracting computer-usable data. This book will take you through a range of techniques for text processing, from basics such as parsing the parts of speech to complex topics such as topic modeling, text classification, and visualization. Starting with an overview of NLP, the book presents recipes for dividing text into sentences, stemming and lemmatization, removing stopwords, and parts of speech tagging to help you to prepare your data. You’ll then learn ways of extracting and representing grammatical information, such as dependency parsing and anaphora resolution, discover different ways of representing the semantics using bag-of-words, TF-IDF, word embeddings, and BERT, and develop skills for text classification using keywords, SVMs, LSTMs, and other techniques. As you advance, you’ll also see how to extract information from text, implement unsupervised and supervised techniques for topic modeling, and perform topic modeling of short texts, such as tweets. Additionally, the book shows you how to develop chatbots using NLTK and Rasa and visualize text data. By the end of this NLP book, you’ll have developed the skills to use a powerful set of tools for text processing.
Table of Contents (10 chapters)

What this book covers

Chapter 1, Learning NLP Basics, is an introductory chapter with basic preprocessing steps for working with text. It includes recipes such as dividing up text into sentences, stemming and lemmatization, removing stopwords, and parts-of-speech tagging. You will find out about different approaches for parts-of-speech tagging, as well as two options for removing stopwords.

Chapter 2, Playing with Grammar, will show how to get and use grammatical information about text. We will create a dependency parse and then use it to split a sentence into clauses. We will also use the dependency parse and noun chunks to extract entities and relations in the text. Certain recipes will show how to extract grammatical information in both English and Spanish.

Chapter 3, Representing Text – Capturing Semantics, covers how, as working with words and semantics is easy for people but difficult for computers, we need to represent text in a way other than words in order for computers to be able to work with the text. This chapter presents different ways of representing text, from a simple bag of words, to BERT. This chapter also discusses a basic implementation of semantic search that uses these semantic representations.

Chapter 4, Classifying Texts, covers text classification, which is one of the most important techniques in NLP. It is used in many different industries for different types of texts, such as tweets, long documents, and sentences. In this chapter, you will learn how to do both supervised and unsupervised text classification with a variety of techniques and tools, including K-Means, SVMs and LSTMs.

Chapter 5, Getting Started with Information Extraction, discusses how one of the main goals of NLP is extracting information from text in order to use it later. This chapter shows different ways of pulling information from text, from the simplest regular expression techniques to find emails and URLs to neural network tools to extract sentiment.

Chapter 6, Topic Modeling, discusses how determining topics of texts is an important NLP tool that can help in text classification and discovering new topics in texts. This chapter introduces different techniques for topic modeling, including unsupervised and supervised techniques, and topic modeling of short texts, such as tweets.

Chapter 7, Building Chatbots, covers chatbots, which are an important marketing tool that has emerged in the last few years. In this chapter, you will learn how to build a chatbot using two different frameworks, NLTK for keyword matching chatbots, and Rasa for sophisticated chatbots with a deep learning model under the hood.

Chapter 8, Visualizing Text Data, discusses how visualizing the results of different NLP analyses can be a very useful tool for presentation and evaluation. This chapter introduces you to visualization techniques for different NLP tools, including NER, topic modeling, and word clouds.