We previously used NLP entity recognition to identify persons from an HTML raw text format. In this chapter, we move to a lower level by trying to infer relations between these entities and detect the possible communities surrounding them.
Within the context of news articles, we first need to ask ourselves a fundamental question. What defines a relation between two entities? The most elegant answer would probably be to study words using the Stanford NLP libraries described in Chapter 6, Scraping Link-Based External Data. Given the following input sentence, which is taken from http://www.ibtimes.co.uk/david-bowie-yoko-ono-says-starmans-death-has-left-big-empty-space-1545160:
"Yoko Ono said she and late husband John Lennon shared a close relationship with David Bowie"
We could easily extract the syntactic tree, a structure that linguists use to model how sentences are grammatically built and where each element is reported with its type such as a noun...