-
Book Overview & Buying
-
Table Of Contents
Python Natural Language Processing Cookbook - Second Edition
By :
A dependency parse is a tool that shows dependencies in a sentence. For example, in the sentence The cat wore a hat, the root of the sentence is the verb, wore, and both the subject, the cat, and the object, a hat, are dependents. The dependency parse can be very useful in many NLP tasks since it shows the grammatical structure of the sentence, with the subject, the main verb, the object, and so on. It can then be used in downstream processing.
The spaCy NLP engine does the dependency parse as part of its overall analysis. The dependency parse tags explain the role of each word in the sentence. ROOT is the main word that all other words depend on, usually the verb.
We will use spaCy to create the dependency parse. The required packages are part of the Poetry environment.
We will take a few sentences from the sherlock_holmes1.txt file to illustrate the dependency parse. The steps are as follows:
%run -i "../util/file_utils.ipynb" %run -i "../util/lang_utils.ipynb"
sentence = 'I have seldom heard him mention her under any other name.'
dep_ attribute, and the explanation of that attribute. The dep_ attribute of the Token object shows the grammatical function of the word in the sentence:def print_dependencies(sentence, model): doc = model(sentence) for token in doc: print(token.text, "\t", token.dep_, "\t", spacy.explain(token.dep_))
heard is the ROOT word of the sentence, with all other words depending on it:print_dependencies(sentence, small_model)
The result should be as follows:
I nsubj nominal subject have aux auxiliary seldom advmod adverbial modifier heard ROOT root him nsubj nominal subject mention ccomp clausal complement her dobj direct object under prep prepositional modifier any det determiner other amod adjectival modifier name pobj object of preposition . punct punctuation
Token class. Using the ancestors and children attributes, we can get the tokens that this token depends on and the tokens that depend on it, respectively. The function to print the ancestors is as follows:def print_ancestors(sentence, model): doc = model(sentence) for token in doc: print(token.text, [t.text for t in token.ancestors])
print_ancestors(sentence, small_model)
The output will be as follows. In the result, we see that heard has no ancestors since it is the main word in the sentence. All other words depend on it, and in fact, contain heard in their ancestor lists.
The dependency chain can be seen by following the ancestor links for each word. For example, if we look at the word name, we see that its ancestors are under, mention, and heard. The immediate parent of name is under, the parent of under is mention, and the parent of mention is heard. A dependency chain will always lead to the root, or the main word, of the sentence:
I ['heard'] have ['heard'] seldom ['heard'] heard [] him ['mention', 'heard'] mention ['heard'] her ['mention', 'heard'] under ['mention', 'heard'] any ['name', 'under', 'mention', 'heard'] other ['name', 'under', 'mention', 'heard'] name ['under', 'mention', 'heard'] . ['heard']
def print_children(sentence, model): doc = model(sentence) for token in doc: print(token.text,[t.text for t in token.children])
print_children(sentence, small_model)
The result should be as follows. Now, the word heard has a list of words that depend on it since it is the main word in the sentence:
I [] have [] seldom [] heard ['I', 'have', 'seldom', 'mention', '.'] him [] mention ['him', 'her', 'under'] her [] under ['name'] any [] other [] name ['any', 'other'] . []
def print_lefts_and_rights(sentence, model): doc = model(sentence) for token in doc: print(token.text, [t.text for t in token.lefts], [t.text for t in token.rights])
print_lefts_and_rights(sentence, small_model)
The result should be as follows:
I [] [] have [] [] seldom [] [] heard ['I', 'have', 'seldom'] ['mention', '.'] him [] [] mention ['him'] ['her', 'under'] her [] [] under [] ['name'] any [] [] other [] [] name ['any', 'other'] [] . [] []
def print_subtree(sentence, model): doc = model(sentence) for token in doc: print(token.text, [t.text for t in token.subtree])
print_subtree(sentence, small_model)
The result should be as follows. From the subtrees that each word is part of, we can see the grammatical phrases that appear in the sentence, such as the noun phrase, any other name, and the prepositional phrase, under any other name:
I ['I'] have ['have'] seldom ['seldom'] heard ['I', 'have', 'seldom', 'heard', 'him', 'mention', 'her', 'under', 'any', 'other', 'name', '.'] him ['him'] mention ['him', 'mention', 'her', 'under', 'any', 'other', 'name'] her ['her'] under ['under', 'any', 'other', 'name'] any ['any'] other ['other'] name ['any', 'other', 'name'] . ['.']
The dependency parse can be visualized graphically using the displaCy package, which is part of spaCy. Please see Chapter 87, Visualizing Text Data, for a detailed recipe on how to do the visualization.