Book Image

Machine Learning for Finance

By : Jannes Klaas
Book Image

Machine Learning for Finance

By: Jannes Klaas

Overview of this book

Machine Learning for Finance explores new advances in machine learning and shows how they can be applied across the financial sector, including insurance, transactions, and lending. This book explains the concepts and algorithms behind the main machine learning techniques and provides example Python code for implementing the models yourself. The book is based on Jannes Klaas’ experience of running machine learning training courses for financial professionals. Rather than providing ready-made financial algorithms, the book focuses on advanced machine learning concepts and ideas that can be applied in a wide variety of ways. The book systematically explains how machine learning works on structured data, text, images, and time series. You'll cover generative adversarial learning, reinforcement learning, debugging, and launching machine learning products. Later chapters will discuss how to fight bias in machine learning. The book ends with an exploration of Bayesian inference and probabilistic programming.
Table of Contents (15 chapters)
Machine Learning for Finance
Contributors
Preface
Other Books You May Enjoy
Index

Topic modeling


A final, very useful application of word counting is topic modeling. Given a set of texts, are we able to find clusters of topics? The method to do this is called Latent Dirichlet Allocation (LDA).

Note

Note: The code and data for this section can be found on Kaggle at https://www.kaggle.com/jannesklaas/topic-modeling-with-lda.

While the name is quite a mouth full, the algorithm is a very useful one, so we will look at it step by step. LDA makes the following assumption about how texts are written:

  1. First, a topic distribution is chosen, say 70% machine learning and 30% finance.

  2. Second, the distribution of words for each topic is chosen. For example, the topic "machine learning" might be made up of 20% the word "tensor," 10% the word "gradient," and so on. This means that our topic distribution is a distribution of distributions, also called a Dirichlet distribution.

  3. Once the text gets written, two probabilistic decisions are made for each word: first, a topic is chosen from the...