Book Image

Natural Language Processing: Python and NLTK

By : Jacob Perkins, Nitin Hardeniya, Deepti Chopra, Iti Mathur, Nisheeth Joshi
Book Image

Natural Language Processing: Python and NLTK

By: Jacob Perkins, Nitin Hardeniya, Deepti Chopra, Iti Mathur, Nisheeth Joshi

Overview of this book

Natural Language Processing is a field of computational linguistics and artificial intelligence that deals with human-computer interaction. It provides a seamless interaction between computers and human beings and gives computers the ability to understand human speech with the help of machine learning. The number of human-computer interaction instances are increasing so it’s becoming imperative that computers comprehend all major natural languages. The first NLTK Essentials module is an introduction on how to build systems around NLP, with a focus on how to create a customized tokenizer and parser from scratch. You will learn essential concepts of NLP, be given practical insight into open source tool and libraries available in Python, shown how to analyze social media sites, and be given tools to deal with large scale text. This module also provides a workaround using some of the amazing capabilities of Python libraries such as NLTK, scikit-learn, pandas, and NumPy. The second Python 3 Text Processing with NLTK 3 Cookbook module teaches you the essential techniques of text and language processing with simple, straightforward examples. This includes organizing text corpora, creating your own custom corpus, text classification with a focus on sentiment analysis, and distributed text processing methods. The third Mastering Natural Language Processing with Python module will help you become an expert and assist you in creating your own NLP projects using NLTK. You will be guided through model development with machine learning tools, shown how to create training data, and given insight into the best practices for designing and building NLP-based applications using Python. This Learning Path combines some of the best that Packt has to offer in one complete, curated package and is designed to help you quickly learn text processing with Python and NLTK. It includes content from the following Packt products: ? NTLK essentials by Nitin Hardeniya ? Python 3 Text Processing with NLTK 3 Cookbook by Jacob Perkins ? Mastering Natural Language Processing with Python by Deepti Chopra, Nisheeth Joshi, and Iti Mathur
Table of Contents (6 chapters)

Chapter 4. Parts-of-Speech Tagging – Identifying Words

Parts-of-speech (POS) tagging is one of the many tasks in NLP. It is defined as the process of assigning a particular parts-of-speech tag to individual words in a sentence. The parts-of-speech tag identifies whether a word is a noun, verb, adjective, and so on. There are numerous applications of parts-of-speech tagging, such as information retrieval, machine translation, NER, language analysis, and so on.

This chapter will include the following topics:

  • Creating POS tagged corpora
  • Selecting a machine learning algorithm
  • Statistical modeling involving the n-gram approach
  • Developing a chunker using POS tagged data

Introducing parts-of-speech tagging

Parts-of-speech tagging is the process of assigning a category (for example, noun, verb, adjective, and so on) tag to individual tokens in a sentence. In NLTK, taggers are present in the nltk.tag package and it is inherited by the TaggerIbase class.

Consider an example to implement...