With more and more people interacting together and communicating, exchanging information, or simply sharing a common interest in different topics, most data science use cases can be addressed using graph representations. Although very large graphs were, for a long time, only used by the Internet giants, government, and national security agencies, it is becoming more common place to work with large graphs containing millions of vertices. Hence, the main challenge of a data scientist will not necessarily be to detect communities and find influencers on graphs, but rather to do so in a fully distributed and efficient way in order to overcome the constraint of scale. This chapter progresses through building a graph example, at scale, using the persons we identified using NLP extraction described in Chapter 6, Scraping Link-Based External Data.
In this chapter, we will cover the following topics:
Use Spark to extract content from Elasticsearch, build a Graph of person...