Mastering Natural Language Processing with Python

Mastering Natural Language Processing with Python

By : Deepti Chopra, Nisheeth Joshi, Iti Mathur

Buy this Book

Mastering Natural Language Processing with Python

By: Deepti Chopra, Nisheeth Joshi, Iti Mathur

Buy this Book

Overview of this book

Natural Language Processing is one of the fields of computational linguistics and artificial intelligence that is concerned with human-computer interaction. It provides a seamless interaction between computers and human beings and gives computers the ability to understand human speech with the help of machine learning. This book will give you expertise on how to employ various NLP tasks in Python, giving you an insight into the best practices when designing and building NLP-based applications using Python. It will help you become an expert in no time and assist you in creating your own NLP projects using NLTK. You will sequentially be guided through applying machine learning tools to develop various models. We’ll give you clarity on how to create training data and how to implement major NLP applications such as Named Entity Recognition, Question Answering System, Discourse Analysis, Transliteration, Word Sense disambiguation, Information Retrieval, Sentiment Analysis, Text Summarization, and Anaphora Resolution.

Mastering Natural Language Processing with Python

Credits

About the Authors

About the Reviewer

www.PacktPub.com

Preface

Free Chapter

Working with Strings

Tokenization

Normalization

Substituting and correcting tokens

Applying Zipf's law to text

Similarity measures

Summary

Statistical Language Modeling

Understanding word frequency

Applying smoothing on the MLE model

Develop a back-off mechanism for MLE

Applying interpolation on data to get mix and match

Evaluate a language model through perplexity

Applying metropolis hastings in modeling languages

Applying Gibbs sampling in language processing

Summary

Morphology – Getting Our Feet Wet

Introducing morphology

Understanding stemmer

Understanding lemmatization

Developing a stemmer for non-English language

Morphological analyzer

Morphological generator

Search engine

Summary

Parts-of-Speech Tagging – Identifying Words

Introducing parts-of-speech tagging

Creating POS-tagged corpora

Selecting a machine learning algorithm

Statistical modeling involving the n-gram approach

Developing a chunker using pos-tagged corpora

Summary

Parsing – Analyzing Training Data

Introducing parsing

Treebank construction

Extracting Context Free Grammar (CFG) rules from Treebank

Creating a probabilistic Context Free Grammar from CFG

CYK chart parsing algorithm

Earley chart parsing algorithm

Summary

Semantic Analysis – Meaning Matters

Introducing semantic analysis

Generation of the synset id from Wordnet

Disambiguating senses using Wordnet

Summary

Sentiment Analysis – I Am Happy

Introducing sentiment analysis

Summary

Information Retrieval – Accessing Information

Introducing information retrieval

Vector space scoring and query operator interaction

Developing an IR system using latent semantic indexing

Text summarization

Question-answering system

Summary

Discourse Analysis – Knowing Is Believing

Introducing discourse analysis

Summary

Evaluation of NLP Systems – Analyzing Performance

The need for evaluation of NLP systems

Evaluation of IR system

Metrics for error identification

Metrics based on lexical matching

Metrics based on syntactic matching

Metrics using shallow semantic matching

Summary

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Applying interpolation on data to get mix and match

The limitation of using an additive smoothed bigram is that we back off to a state of ignorance when we deal with rare text. For example, the word captivating occurs five times in a training data: thrice followed by by and twice followed by the. With additive smoothing, the occurrence of a and new before captivating is the same. Both the occurrences are plausible, but the former is more probable as compared to latter. This problem can be rectified using unigram probabilities. We can develop an interpolation model in which both the unigram and bigram probabilities can be combined.

In SRILM, we perform interpolation by first training a unigram model with -order 1 and –order 2 used for the bigram model:

ngram - count - text / home / linux / ieng6 / ln165w / public / data / engand hintrain . txt \ - vocab / home / linux / ieng6 / ln165w / public / data / engandhinlexicon . txt \ - order 1 - addsmooth 0.0001 - lm wsj1 . lm

Mastering Natural Language Processing with Python

By : Deepti Chopra, Nisheeth Joshi, Iti Mathur

Mastering Natural Language Processing with Python

By: Deepti Chopra, Nisheeth Joshi, Iti Mathur

Overview of this book

Related Content you might be interested in

Current Title:

Mastering Natural Language Processing with Python

Applying interpolation on data to get mix and match