Graph Machine Learning

By : Claudio Stamile, Aldo Marzullo, Enrico Deusebio

5 (1)

Buy this Book

Graph Machine Learning

5 (1)

By: Claudio Stamile, Aldo Marzullo, Enrico Deusebio

Buy this Book

Overview of this book

Graph Machine Learning will introduce you to a set of tools used for processing network data and leveraging the power of the relation between entities that can be used for predictive, modeling, and analytics tasks. The first chapters will introduce you to graph theory and graph machine learning, as well as the scope of their potential use. You’ll then learn all you need to know about the main machine learning models for graph representation learning: their purpose, how they work, and how they can be implemented in a wide range of supervised and unsupervised learning applications. You'll build a complete machine learning pipeline, including data processing, model training, and prediction in order to exploit the full potential of graph data. After covering the basics, you’ll be taken through real-world scenarios such as extracting data from social networks, text analytics, and natural language processing (NLP) using graphs and financial transaction systems on graphs. You’ll also learn how to build and scale out data-driven applications for graph analytics to store, query, and process network information, and explore the latest trends on graphs. By the end of this machine learning book, you will have learned essential concepts of graph theory and all the algorithms and techniques used to build successful machine learning applications.

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Reviews

Section 1 – Introduction to Graph Machine Learning

Free Chapter

Chapter 1: Getting Started with Graphs

Technical requirements

Introduction to graphs with networkx

Plotting graphs

Graph properties

Benchmarks and repositories

Dealing with large graphs

Summary

Chapter 2: Graph Machine Learning

Technical requirements

Understanding machine learning on graphs

The generalized graph embedding problem

The taxonomy of graph embedding machine learning algorithms

Summary

Section 2 – Machine Learning on Graphs

Chapter 3: Unsupervised Graph Learning

Technical requirements

The unsupervised graph embedding roadmap

Shallow embedding methods

Autoencoders

Graph neural networks

Summary

Chapter 4: Supervised Graph Learning

Technical requirements

The supervised graph embedding roadmap

Feature-based methods

Shallow embedding methods

Graph regularization methods

Graph CNNs

Summary

Chapter 5: Problems with Machine Learning on Graphs

Technical requirements

Predicting missing links in a graph

Detecting meaningful structures such as communities

Detecting graph similarities and graph matching

Summary

Section 3 – Advanced Applications of Graph Machine Learning

Chapter 6: Social Network Graphs

Technical requirements

Overview of the dataset

Network topology and community detection

Embedding for supervised and unsupervised tasks

Summary

Chapter 7: Text Analytics and Natural Language Processing Using Graphs

Technical requirements

Providing a quick overview of a dataset

Understanding the main concepts and tools used in NLP

Creating graphs from a corpus of documents

Building a document topic classifier

Summary

Chapter 8:Graph Analysis for Credit Card Transactions

Technical requirements

Overview of the dataset

Network topology and community detection

Embedding for supervised and unsupervised fraud detection

Summary

Chapter 9: Building a Data-Driven Graph-Powered Application

Technical requirements

Overview of Lambda architectures

Lambda architectures for graph-powered applications

Summary

Chapter 10: Novel Trends on Graphs

Technical requirements

Learning about data augmentation for graphs

Learning about topological data analysis

Applying graph theory in new domains

Summary

Why subscribe?

Other Books You May Enjoy

Packt is searching for authors like you

Leave a review - let other readers know what you think

Customer Reviews

5 (1)

5 star

100%

4 star

3 star

2 star

1 star

Dealing with large graphs

When approaching a use case or an analysis, it is very important to understand how large the data we focus on is or will be in the future, as the dimension of the datasets may very well impact both the technologies we use and the analysis that we can do. As already mentioned, some of the approaches that have been developed on small datasets hardly scale to real-world applications and larger datasets, making them useless in practice.

When dealing with (possibly) large graphs, it is crucial to understand potential bottlenecks and limitation of the tools, technologies, and/or algorithms we use, assessing which part of our application/analysis may not scale when increasing the number of nodes or edges. Even more importantly, it is crucial to structure a data-driven application, however simple or at early proof of concept (POC) stages, in a way that would allow its scaling out in the future when data/users would increase, without rewriting the whole application.

Creating a data-driven application that resorts to graphical representation/modeling is a challenging task that requires a design and implementation that is a lot more complicated than simply importing networkx. In particular, it is often useful to decouple the component that processes the graph—named graph processing engine—from the one that allows querying and traversing the graph—the graph storage layer. We will further discuss these concepts in Chapter 9, Building a Data-Driven Draft-Powered Application. Nevertheless, given the focus of the book on ML and analytical techniques, it makes sense to focus more on graph processing engines than on graph storage layers. We therefore find it useful to provide you already at this stage with some of the technologies that are used for graph processing engines to deal with large graphs, crucial when scaling out an application.

In this respect, it is important to classify graph processing engines into two categories (that impact the tools/libraries/algorithms to be used), depending whether the graph can fit a shared memory machine or requires distributed architectures to be processed and analyzed.

Note that there is no absolute definition of large and small graphs, but it also depends on the chosen architecture. Nowadays, thanks to the vertical scaling of infrastructures, you can find servers with random-access memory (RAM) larger than 1 terabyte (TB) (usually called fat nodes), and with tens of thousands of central processing units (CPUs) for multithreading in most cloud-provider offerings, although these infrastructures might not be economically viable. Even without scaling out to such extreme architectures, graphs with millions of nodes and tens of millions of edges can nevertheless be easily handled in single servers with ~100 gigabytes (GB) of RAM and ~50 CPUs.

Although networkx is a very popular, user-friendly, and intuitive library, when scaling out to such reasonably large graphs it may not be the best available choice. networkx, being natively written in pure Python, which is an interpreted language, can be substantially outperformed by other graph engines fully or partly written in more performant programming languages (such as C++ and Julia) and that make use of multithreading, such as the following:

SNAP (http://snap.stanford.edu/), which we have already seen in the previous section, is a graph engine developed at Stanford and is written in C++ with available bindings in Python.
igraph (https://igraph.org/) is a C library and features bindings in Python, R, and Mathematica.
graph-tool (https://graph-tool.skewed.de/), despite being a Python module, has core algorithms and data-structures written in C++ and uses OpenMP parallelization to scale on multi-core architectures.
NetworKit (https://networkit.github.io/) is also written in C++ with OpenMP boost for parallelization for its core functionalities, integrated in a Python module.
LightGraphs (https://juliagraphs.org/LightGraphs.jl/latest/) is a library written in Julia that aims to mirroring networkx functionalities in a more performant and robust library.

All the preceding libraries are valid alternatives to networkx when achieving better performance becomes an issue. Improvements can be very substantial, with speed-ups varying from 30 to 300 times faster, with the best performance generally achieved by LightGraphs.

In the forthcoming chapters, we will mostly focus on networkx in order to provide a consistent presentation and provide the user with basic concepts on network analysis. We want you to be aware that other options are available, as this becomes extremely relevant when pushing the edge from a performance standpoint.

Graph Machine Learning

By : Claudio Stamile, Aldo Marzullo, Enrico Deusebio

Graph Machine Learning

By: Claudio Stamile, Aldo Marzullo, Enrico Deusebio

Overview of this book

Related Content you might be interested in

Current Title:

Graph Machine Learning

Hands-On Graph Neural Networks Using Python

Hands-On Graph Analytics with Neo4j

Graph Data Science with Neo4j

Dealing with large graphs