After discussing how to download a list of notes and activities for a given page or user, we will shift our focus to the textual analysis of the content.
For each post published by a given user, we want to extract the most interesting keywords, which could be used to summarize the post itself.
While this is intuitively a simple exercise, there are a few subtleties to consider. On the practical side, we can easily observe that the content of each post is not always a clean piece of text, in fact, HTML tags can be included in the content. Before we can carry out our computation, we need to extract the clean text. While the JSON object returned by the Google+ API has a clear structure, the content itself is not necessarily a well-formed structured document. Fortunately, there's a nice Python package that comes to the rescue. Beautiful Soup is, in fact, able to parse HTML and XML documents, including malformed markup. It is compatible with Python 3 and can be...