Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying Python Natural Language Processing Cookbook
  • Table Of Contents Toc
Python Natural Language Processing Cookbook

Python Natural Language Processing Cookbook - Second Edition

By : Zhenya Antić, Saurabh Chakravarty
5 (5)
close
close
Python Natural Language Processing Cookbook

Python Natural Language Processing Cookbook

5 (5)
By: Zhenya Antić, Saurabh Chakravarty

Overview of this book

Harness the power of Natural Language Processing (NLP) to overcome real-world text analysis challenges with this recipe-based roadmap written by two seasoned NLP experts with vast experience transforming various industries with their NLP prowess. You’ll be able to make the most of the latest NLP advancements, including large language models (LLMs), and leverage their capabilities through Hugging Face transformers. Through a series of hands-on recipes, you’ll master essential techniques such as extracting entities and visualizing text data. The authors will expertly guide you through building pipelines for sentiment analysis, topic modeling, and question-answering using popular libraries like spaCy, Gensim, and NLTK. You’ll also learn to implement RAG pipelines to draw out precise answers from a text corpus using LLMs. This second edition expands your skillset with new chapters on cutting-edge LLMs like GPT-4, Natural Language Understanding (NLU), and Explainable AI (XAI)—fostering trust in your NLP models. By the end of this book, you'll be equipped with the skills to apply advanced text processing techniques, use pre-trained transformer models, build custom NLP pipelines to extract valuable insights from text data to drive informed decision-making.
Table of Contents (13 chapters)
close
close

Getting the dependency parse

A dependency parse is a tool that shows dependencies in a sentence. For example, in the sentence The cat wore a hat, the root of the sentence is the verb, wore, and both the subject, the cat, and the object, a hat, are dependents. The dependency parse can be very useful in many NLP tasks since it shows the grammatical structure of the sentence, with the subject, the main verb, the object, and so on. It can then be used in downstream processing.

The spaCy NLP engine does the dependency parse as part of its overall analysis. The dependency parse tags explain the role of each word in the sentence. ROOT is the main word that all other words depend on, usually the verb.

Getting ready

We will use spaCy to create the dependency parse. The required packages are part of the Poetry environment.

How to do it…

We will take a few sentences from the sherlock_holmes1.txt file to illustrate the dependency parse. The steps are as follows:

  1. Run the file and language utility notebooks:
    %run -i "../util/file_utils.ipynb"
    %run -i "../util/lang_utils.ipynb"
  2. Define the sentence we will be parsing:
    sentence = 'I have seldom heard him mention her under any other name.'
  3. Define a function that will print the word, its grammatical function embedded in the dep_ attribute, and the explanation of that attribute. The dep_ attribute of the Token object shows the grammatical function of the word in the sentence:
    def print_dependencies(sentence, model):
        doc = model(sentence)
        for token in doc:
            print(token.text, "\t", token.dep_, "\t", 
                spacy.explain(token.dep_))
  4. Now, let’s use this function on the first sentence in our list. We can see that the verb heard is the ROOT word of the sentence, with all other words depending on it:
    print_dependencies(sentence, small_model)

    The result should be as follows:

    I    nsubj    nominal subject
    have    aux    auxiliary
    seldom    advmod    adverbial modifier
    heard    ROOT    root
    him    nsubj    nominal subject
    mention    ccomp    clausal complement
    her    dobj    direct object
    under    prep    prepositional modifier
    any    det    determiner
    other    amod    adjectival modifier
    name    pobj    object of preposition
    .    punct    punctuation
  5. To explore the dependency parse structure, we can use the attributes of the Token class. Using the ancestors and children attributes, we can get the tokens that this token depends on and the tokens that depend on it, respectively. The function to print the ancestors is as follows:
    def print_ancestors(sentence, model):
        doc = model(sentence)
        for token in doc:
            print(token.text, [t.text for t in token.ancestors])
  6. Now, let’s use this function on the first sentence in our list:
    print_ancestors(sentence, small_model)

    The output will be as follows. In the result, we see that heard has no ancestors since it is the main word in the sentence. All other words depend on it, and in fact, contain heard in their ancestor lists.

    The dependency chain can be seen by following the ancestor links for each word. For example, if we look at the word name, we see that its ancestors are under, mention, and heard. The immediate parent of name is under, the parent of under is mention, and the parent of mention is heard. A dependency chain will always lead to the root, or the main word, of the sentence:

    I ['heard']
    have ['heard']
    seldom ['heard']
    heard []
    him ['mention', 'heard']
    mention ['heard']
    her ['mention', 'heard']
    under ['mention', 'heard']
    any ['name', 'under', 'mention', 'heard']
    other ['name', 'under', 'mention', 'heard']
    name ['under', 'mention', 'heard']
    . ['heard']
  7. To see all the children, use the following function. This function prints out each word and the words that depend on it, its children:
    def print_children(sentence, model):
        doc = model(sentence)
        for token in doc:
            print(token.text,[t.text for t in token.children])
  8. Now, let’s use this function on the first sentence in our list:
    print_children(sentence, small_model)

    The result should be as follows. Now, the word heard has a list of words that depend on it since it is the main word in the sentence:

    I []
    have []
    seldom []
    heard ['I', 'have', 'seldom', 'mention', '.']
    him []
    mention ['him', 'her', 'under']
    her []
    under ['name']
    any []
    other []
    name ['any', 'other']
    . []
  9. We can also see left and right children in separate lists. In the following function, we print the children as two separate lists, left and right. This can be useful when doing grammatical transformations in the sentence:
    def print_lefts_and_rights(sentence, model):
        doc = model(sentence)
        for token in doc:
            print(token.text,
                [t.text for t in token.lefts],
                [t.text for t in token.rights])
  10. Let’s use this function on the first sentence in our list:
    print_lefts_and_rights(sentence, small_model)

    The result should be as follows:

    I [] []
    have [] []
    seldom [] []
    heard ['I', 'have', 'seldom'] ['mention', '.']
    him [] []
    mention ['him'] ['her', 'under']
    her [] []
    under [] ['name']
    any [] []
    other [] []
    name ['any', 'other'] []
    . [] []
  11. We can also see the subtree that the token is in by using this function:
    def print_subtree(sentence, model):
        doc = model(sentence)
        for token in doc:
            print(token.text, [t.text for t in token.subtree])
  12. Let’s use this function on the first sentence in our list:
    print_subtree(sentence, small_model)

    The result should be as follows. From the subtrees that each word is part of, we can see the grammatical phrases that appear in the sentence, such as the noun phrase, any other name, and the prepositional phrase, under any other name:

    I ['I']
    have ['have']
    seldom ['seldom']
    heard ['I', 'have', 'seldom', 'heard', 'him', 'mention', 'her', 'under', 'any', 'other', 'name', '.']
    him ['him']
    mention ['him', 'mention', 'her', 'under', 'any', 'other', 'name']
    her ['her']
    under ['under', 'any', 'other', 'name']
    any ['any']
    other ['other']
    name ['any', 'other', 'name']
    . ['.']

See also

The dependency parse can be visualized graphically using the displaCy package, which is part of spaCy. Please see Chapter 87, Visualizing Text Data, for a detailed recipe on how to do the visualization.

CONTINUE READING
83
Tech Concepts
36
Programming languages
73
Tech Tools
Icon Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.
Icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Icon 50+ new titles added per month and exclusive early access to books as they are being written.
Python Natural Language Processing Cookbook
notes
bookmark Notes and Bookmarks search Search in title playlist Add to playlist download Download options font-size Font size

Change the font size

margin-width Margin width

Change margin width

day-mode Day/Sepia/Night Modes

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY

Submit Your Feedback

Modal Close icon
Modal Close icon
Modal Close icon