Here are the answers to the questions posed in the above sections:
Can we remove stop words before POS tagging?
No; If we remove the stop words, we will lose the context, and some of the POS taggers (Pre-Trained model) use word context as features to give the POS of the given word.
How can we get all the verbs in the sentence?
We can get all the verbs in the sentence by using
pos_tag
>>>tagged = nltk.pos_tag(word_tokenize(s)) >>>allverbs = [word for word,pos in tagged if pos in ['VB','VBD','VBG'] ]
Can you modify the code of the hybrid tagger in the N-gram tagger section to work with Regex tagger? Does that improve performance?
Yes. We can modify the code of the hybrid tagger in the N-gram tagger section to work with the Regex tagger:
>>>print unigram_tagger.evaluate(test_data,backoff= regexp_tagger) >>>bigram_tagger = BigramTagger(train_data, backoff=unigram_tagger) >>>print bigram_tagger.evaluate(test_data) >>>trigram_tagger=TrigramTagger...