Book Image

Mastering Gephi Network Visualization

Book Image

Mastering Gephi Network Visualization

Overview of this book

Table of Contents (19 chapters)
Mastering Gephi Network Visualization
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Index

Essential plugins


Plugins used in Gephi do not abide by the primary and secondary workspaces model that we just covered. Instead, they logically wind up where they are designed to be used; layout plugins are placed in the layout tab, formatting plugins can be found at the perimeter of the graph window, and so on.

The basic idea with Gephi plugins, as with plugins for other software, is to add features that are not readily available in the core software. In some cases, this will be in the form of functions that help users to better format their graphs, while in other cases the plugins represent full fledged layout algorithms or graph generators that provide users with additional choices for graph creation and analysis.

There are a number of Gephi plugins which we will use later in this text, so it might be the best to download and install them early on so that you can follow along with some of the examples. While the Gephi Marketplace plays host to some excellent plugins that extend the core Gephi functionality, the number is very manageable. If you wish, download and install them all as the installation process is very simple, and the space requirement is minimal.

Here are some of the most essential plugins I refer to, and often use within the course of this book, along with a brief description of their category and functionality. If you need more detailed information, please navigate to the Gephi Marketplace site to learn more. Many of these plugins will be used within the book, so it might be a good idea to download them all up front so that you will have a fully capable Gephi installation to work with as you follow examples in the book.

I'm going to walk you through these plugins by category, providing brief descriptions of each. We will go into greater detail as each plugin is used within subsequent chapters.

Clustering – Chinese Whispers

Gephi provides several options for partitioning and/or clustering graph data, including this useful plugin. The goal of this clustering approach is to partition your network data into individual clusters, which can then be used to color or size the graph nodes, creating a more intuitive and easily interpreted visualization. While it is possible to color nodes in Gephi manually or through partitioning, the Chinese Whispers clustering provides another option that is based on an analysis of network patterns.

Data laboratory

The data laboratory is where all the data manipulation takes place. While the base installation provides several helpful functions, others can be added using plugins.

Data laboratory helper

For users frustrated with the limited ability to edit data in Gephi, this plugin provides the ability to recast the column type (from string to numeric, for example) as well as to create new columns based on existing values.

Exports

The ability to export data and graphs from Gephi to other formats is highly useful, as it makes Gephi a very flexible tool for further interaction with or deployment of network data. We'll provide a brief overview of a few plugins that can be used to display network graphs beyond Gephi.

Sigma.js Exporter

One of the best ways to make your network graph even more powerful is to deploy it on the Web and make it interactive using Sigma.js, with the zooming, grouping, and filtering capabilities. This is an easy to use tool that provides a template approach to publish interactive graphs, making it especially easy to replicate a series of graphs using a consistent approach. Once the graph has been exported to Sigma, additional customization is possible using the CSS, JavaScript, and HTML methods. We will learn more about Sigma.js and explore actual examples in Chapter 9, Taking Your Graph Beyond Gephi.

Seadragon Web Export

Another option for deploying a graph to the Web is through Seadragon, which permits graph users to zoom in and out of your graph, which can be especially useful in the case of large or very dense networks. While this option does not provide the full range of capabilities found with Sigma.js, it does provide a quick solution to make your graph accessible through the Web.

Graph Streaming

One of the most powerful aspects of network analysis is its ability to see how a network evolves over time, rather than viewing a static graph. There are a couple of ways we might approach this, the Graph Streaming plugin providing perhaps the most powerful approach. All that is required to use this tool is a JSON dataset with time elements.

ExportToEarth

Users with geography-based datasets are able to use Gephi to create their initial graph before exporting the network in the .kmz format used by Google Earth and other GIS programs. All that is required to leverage this tool is some geocoded information in your data file, such as latitude and longitude data.

Generator – the Complex Generators plugin

A wide range of fundamental network graph types can be generated using this tool, including Erdos-Renyi (random graphs), Barabási-Albert (preferential attachment), and Watts-Strogatz (small world) graphs. These generators help to provide a quick visual understanding of several classic network growth theories, and can ultimately help us to comprehend network behavior while viewing existing graphs.

Here are three simple examples created in Gephi using the Random Graph, Barabási-Albert scale-free model, and the Watts-Strogatz small world model Alpha generators, all using a 20 node specification.

First, let's take a look at the random graph example, which is as follows:

Random graph model

Next, a scale-free model is as follows:

Barabási-Albert scale-free model

Lastly, a small world graph is as shown here:

Watts-Strogatz small world model Alpha

Note the dramatic differences in network structures between the three models, all based on underlying assumptions of network growth. As mentioned earlier, the generators are very useful to understand and visualize network structures using different assumptions, which can then provide insight when we create graphs from our real-world datasets.

Layout

One of the most powerful ways to extend Gephi is through the use of a wide array of layout algorithms available through plugins. These layouts, when paired with the multiple layout options already available in base Gephi, will provide you with a wealth of choices to map your networks. Some of these choices will be useful for very specific use cases, while others are much more generally used for a variety of networks. Let's take a quick look at a handful of highly useful layout algorithms, and the situations where we might find them most appropriate.

The Multipartite layout

A multipartite graph is a network with multiple nodes (vertices) that belong to different groups, where all edges are between members of different partitions, and no edges can be found between members of the same partition. One can think of this in terms of members of a category (a sports team, for instance) that are connected to the top level of the category, but not to one another. If the team and team members represent the only two partitions, we have a bipartite graph; but we could also have many cases with more than two partitions. This becomes especially useful in cases where we have a temporal network, where players are associated with Team A initially, but are later traded to Team B or Team C.

The primary purpose of the multipartite layout in Gephi is to minimize edge crossings, thus making it easier to view and interpret the graph.

The Hiveplot layout

A Hiveplot is a graph layout that attempts to overcome the so-called hairball effect produced by large, highly connected networks. The hiveplot addresses this by placing nodes along multiple radial axes based on network structure. This approach is particularly appealing in cases where there are three or more definable levels, as it will position nodes in an effort to avoid some of the unintended or misleading effects that might appear while using other algorithms. We'll examine this approach further in Chapter 4, Network Patterns.

The Concentric layout

Networks are often most easily viewed using familiar visual forms such as circles. Concentric layouts allow us to take advantage of this, particularly while working with small to medium datasets. Nodes are arranged in a series of concentric circles based on the distance from our central node. Thus, nodes with direct connections are arranged in the first circle followed by nodes that are at a distance of two nodes away from the center, and so on. By arranging nodes in this concentric fashion, viewers are able to more easily navigate small network structures and see the closeness of relationships to a single node, and to each other.

The OpenOrd layout

The OpenOrd plugin helps to generate network graphs very rapidly, and is best suited to very large networks due to a loss of precision in the interest of greater speed. This approach is based on the classic, but much slower, Fruchterman-Reingold algorithm provided with Gephi. In cases where you are dealing with hundreds of thousands of rows of data, this algorithm enables a rapid look at the network structure.

The Circular layout

The circular layout plugin actually provides three distinct layout types—the circular layout, dual circle layout, and radial axis layout. A variety of options allow users to order nodes by degree, ID, attribute sort, or randomly. This can be especially useful to arrange a network based on predefined characteristics, as opposed to calculated relationships within the network.

The Layered layout

The layered layout is a useful layout for cases where we wish to visualize a small world phenomenon using numerical values to assign the layer, or orbit, that each node resides in. Stronger relationships to a key node will occupy inner orbits, with more distant connections occupying the perimeter of the graph. This approach is similar to the one used by the concentric layout.

The ARF layout

The Attractive and Repulsive forces (ARF) layout provides a useful layout tool that affords considerable flexibility through attraction and repulsion settings. ARF outputs tend toward a more circular appearance than many of the other spring-based algorithms such as the Fruchterman-Reingold and the Force Atlas models.

Additional plugins

A number of additional plugins are provided for Gephi, with new ones being added on a regular basis. Here are a couple tools that provide even more utility as you create and analyze your network graphs.

Link Communities – metrics

Link Communities is a clustering approach that assesses links in undirected and unweighted networks and then classifies nodes into communities based on their similarity. Nodes can be placed into multiple communities, making this approach differ from other clustering approaches. Once the metric has been computed, users can then select a layout algorithm of their choice to display the network.

Give color to nodes – tools

One of the most effective methods to convey network information is through the use of color. Gephi provides the ability to color individual nodes within the graph window, but this useful plugin lets you provide colors within your dataset that can be used to color the entire graph, versus making ad hoc changes using the Gephi toolbars.