In this chapter, you will learn the algorithms written in R for graph mining and network analysis.
In this chapter, we will cover the following topics:
- Graph mining
- Mining frequent subgraph patterns
- Social network mining
- Social influence mining
Grouping, messaging, dating, and many other means are the major forms of social communication or the classic social behavior in the social network. All these concepts are modeled with graphs; that is, nodes, edges, and other attributes. Graph mining is developed to mine this information, which is similar to other types of information, such as biological information, and so on.
Graph G contains nodes V and edges E and is represented with an equation, G = (V, E). As per graph mining, there are some concepts that need to be clarified. There are two types of graphs: directed graphs, which have ordered pairs of vertices in the edge set, E, and undirected graphs.
Although the data instances under research are very different from the other data types that we saw earlier in this book, graph-mining algorithms still include frequent pattern (subgraph) mining, classification, and clustering.
In the next section, we will look at frequent subgraph patterns mining algorithm, links mining, and clustering.
The subgraph pattern or graph pattern is an important application of data mining; this is used for bioinformatics, social network analysis, and so on. Frequent subgraph patterns are patterns that occur frequently in a set of graphs or in a large graph.
The gPLS algorithm
The GraphSig algorithm
Rightmost path extensions and their supports
The subgraph isomorphism enumeration algorithm
The canonical checking algorithm
Social network is based on human interactions, from the most classical definition. The data instances collected in the social network have graph-like and temporal characteristics. There are two major strategies for data mining tasks for social networks: one is linkage-based or structure-based, and the other is content-based. The data instances collected in the social network also have two kinds of data instances: static and dynamic or times-series data, such as the tweets on Twitter. Due to the characteristics of the data instance of graphs, there are vast versatile algorithms developed to solve the challenges.
Community detection and the shingling algorithm
Here are some practice questions for you to check whether you have understood the concepts:
- What is a graph?
- What graph opportunities are used?
- What is the PageRank algorithm, and what is its application in web search?
In this chapter, we looked at:
- Graph mining. We also saw that the characteristics of graph data can be divided into frequent pattern mining, classification, and clustering
- Mining frequent subgraph patterns is done to find the frequent patterns in a set of graphs or a single massive graph
- Social network analysis includes a wide range of web applications with broad definitions, such as Facebook, LinkedIn, Google+, StackOverflow, and so on
In the next chapter, we will focus on the major topics related to web mining and algorithms and look at some examples based on them.