Book Image

Machine Learning for Finance

By : Jannes Klaas
Book Image

Machine Learning for Finance

By: Jannes Klaas

Overview of this book

Machine Learning for Finance explores new advances in machine learning and shows how they can be applied across the financial sector, including insurance, transactions, and lending. This book explains the concepts and algorithms behind the main machine learning techniques and provides example Python code for implementing the models yourself. The book is based on Jannes Klaas’ experience of running machine learning training courses for financial professionals. Rather than providing ready-made financial algorithms, the book focuses on advanced machine learning concepts and ideas that can be applied in a wide variety of ways. The book systematically explains how machine learning works on structured data, text, images, and time series. You'll cover generative adversarial learning, reinforcement learning, debugging, and launching machine learning products. Later chapters will discuss how to fight bias in machine learning. The book ends with an exploration of Bayesian inference and probabilistic programming.
Table of Contents (15 chapters)
Machine Learning for Finance
Other Books You May Enjoy

The heuristic approach

Earlier in this chapter, we introduced the three models that we will be using to detect fraud, now it's time to explore each of them in more detail. We're going to start with the heuristic approach.

Let's start by defining a simple heuristic model and measuring how well it does at measuring fraud rates.

Making predictions using the heuristic model

We will be making our predictions using the heuristic approach over the entire training data set in order to get an idea of how well this heuristic model does at predicting fraudulent transactions.

The following code will create a new column, Fraud_Heuristic, and in turn assigns a value of 1 in rows where the type is TRANSFER, and the amount is more than $200,000:

df['Fraud_Heuristic '] = np.where(((df['type'] == 'TRANSFER') &(df['amount'] > 200000)),1,0)

With just two lines of code, it's easy to see how such a simple metric can be easy to write, and quick to deploy.

The F1 score

One important thing we must consider is the...