Book Image

Mastering Natural Language Processing with Python

By : Deepti Chopra, Nisheeth Joshi, Iti Mathur
Book Image

Mastering Natural Language Processing with Python

By: Deepti Chopra, Nisheeth Joshi, Iti Mathur

Overview of this book

<p>Natural Language Processing is one of the fields of computational linguistics and artificial intelligence that is concerned with human-computer interaction. It provides a seamless interaction between computers and human beings and gives computers the ability to understand human speech with the help of machine learning.</p> <p>This book will give you expertise on how to employ various NLP tasks in Python, giving you an insight into the best practices when designing and building NLP-based applications using Python. It will help you become an expert in no time and assist you in creating your own NLP projects using NLTK.</p> <p>You will sequentially be guided through applying machine learning tools to develop various models. We’ll give you clarity on how to create training data and how to implement major NLP applications such as Named Entity Recognition, Question Answering System, Discourse Analysis, Transliteration, Word Sense disambiguation, Information Retrieval, Sentiment Analysis, Text Summarization, and Anaphora Resolution.</p>
Table of Contents (17 chapters)
Mastering Natural Language Processing with Python
Credits
About the Authors
About the Reviewer
www.PacktPub.com
Preface
Index

Developing a stemmer for non-English language


Polyglot is a software that is used to provide models called morfessor models that are used to obtain morphemes from tokens. The Morpho project's goal is to create unsupervised data-driven processes. The main aim of the Morpho project is to focus on the creation of morphemes, which is the smallest unit of syntax. Morphemes play an important role in natural language processing. Morphemes are useful in automatic recognition and the creation of language. With the help of the vocabulary dictionaries of Polyglot, morfessor models on the 50,000 tokens of different languages were used.

Let's see the code for obtaining the language table using polyglot:

from polyglot.downloader import downloader
print(downloader.supported_languages_table("morph2"))

The output obtained from preceding code is the languages listed here:

The necessary models can be downloaded using the following code:

%%bash
polyglot download morph2.en morph2.ar

[polyglot_data] Downloading package...