Book Image

Mastering Natural Language Processing with Python

By : Deepti Chopra, Nisheeth Joshi, Iti Mathur
Book Image

Mastering Natural Language Processing with Python

By: Deepti Chopra, Nisheeth Joshi, Iti Mathur

Overview of this book

<p>Natural Language Processing is one of the fields of computational linguistics and artificial intelligence that is concerned with human-computer interaction. It provides a seamless interaction between computers and human beings and gives computers the ability to understand human speech with the help of machine learning.</p> <p>This book will give you expertise on how to employ various NLP tasks in Python, giving you an insight into the best practices when designing and building NLP-based applications using Python. It will help you become an expert in no time and assist you in creating your own NLP projects using NLTK.</p> <p>You will sequentially be guided through applying machine learning tools to develop various models. We’ll give you clarity on how to create training data and how to implement major NLP applications such as Named Entity Recognition, Question Answering System, Discourse Analysis, Transliteration, Word Sense disambiguation, Information Retrieval, Sentiment Analysis, Text Summarization, and Anaphora Resolution.</p>
Table of Contents (17 chapters)
Mastering Natural Language Processing with Python
Credits
About the Authors
About the Reviewer
www.PacktPub.com
Preface
Index

Text summarization


Text summarization is the process of generating summaries from a given long text. Based on the Luhn work, The Automatic Creation of Literature Abstracts (1958), a naïve summarization approach known as NaiveSumm is developed. It makes use of a word's frequencies for the computation and extraction of sentences that consist of the most frequent words. Using this approach, text summarization can be performed by extracting a few specific sentences.

Let's see the following code in NLTK that can be used for performing text summarization:

from nltk.tokenize import sent_tokenize,word_tokenize
from nltk.corpus import stopwords
from collections import defaultdict
from string import punctuation
from heapq import nlargest

class Summarize_Frequency:
  def __init__(self, cut_min=0.2, cut_max=0.8):
    """
     Initilize the text summarizer.
     Words that have a frequency term lower than cut_min
     or higer than cut_max will be ignored.
    """
    self._cut_min = cut_min
    self...