Book Image

Graph Machine Learning

By : Claudio Stamile, Aldo Marzullo, Enrico Deusebio
5 (1)
Book Image

Graph Machine Learning

5 (1)
By: Claudio Stamile, Aldo Marzullo, Enrico Deusebio

Overview of this book

Graph Machine Learning will introduce you to a set of tools used for processing network data and leveraging the power of the relation between entities that can be used for predictive, modeling, and analytics tasks. The first chapters will introduce you to graph theory and graph machine learning, as well as the scope of their potential use. You’ll then learn all you need to know about the main machine learning models for graph representation learning: their purpose, how they work, and how they can be implemented in a wide range of supervised and unsupervised learning applications. You'll build a complete machine learning pipeline, including data processing, model training, and prediction in order to exploit the full potential of graph data. After covering the basics, you’ll be taken through real-world scenarios such as extracting data from social networks, text analytics, and natural language processing (NLP) using graphs and financial transaction systems on graphs. You’ll also learn how to build and scale out data-driven applications for graph analytics to store, query, and process network information, and explore the latest trends on graphs. By the end of this machine learning book, you will have learned essential concepts of graph theory and all the algorithms and techniques used to build successful machine learning applications.
Table of Contents (15 chapters)
1
Section 1 – Introduction to Graph Machine Learning
4
Section 2 – Machine Learning on Graphs
8
Section 3 – Advanced Applications of Graph Machine Learning

Overview of the dataset

The dataset used in this chapter is the Credit Card Transactions Fraud Detection Dataset available on Kaggle at the following URL: https://www.kaggle.com/kartik2112/fraud-detection?select=fraudTrain.csv.

The dataset is made up of simulated credit card transactions containing legitimate and fraudulent transactions for the period January 1, 2019 – December 31, 2020. It includes the credit cards of 1,000 customers performing transactions with a pool of 800 merchants. The dataset was generated using Sparkov Data Generation. More information about the generation algorithm is available at the following URL: https://github.com/namebrandon/Sparkov_Data_Generation.

For each transaction, the dataset contains 23 different features. In the following table, we will show only the information that will be used in this chapter:

Table 8.1 – List of variables used in the dataset

For the purposes of our analysis, we will use the fraudTrain...