Machine Learning for Algorithmic Trading - Second Edition

By : Stefan Jansen

Machine Learning for Algorithmic Trading - Second Edition

By: Stefan Jansen

Overview of this book

The explosive growth of digital data has boosted the demand for expertise in trading strategies that use machine learning (ML). This revised and expanded second edition enables you to build and evaluate sophisticated supervised, unsupervised, and reinforcement learning models. This book introduces end-to-end machine learning for the trading workflow, from the idea and feature engineering to model optimization, strategy design, and backtesting. It illustrates this by using examples ranging from linear models and tree-based ensembles to deep-learning techniques from cutting edge research. This edition shows how to work with market, fundamental, and alternative data, such as tick data, minute and daily bars, SEC filings, earnings call transcripts, financial news, or satellite images to generate tradeable signals. It illustrates how to engineer financial features or alpha factors that enable an ML model to predict returns from price data for US and international stocks and ETFs. It also shows how to assess the signal content of new features using Alphalens and SHAP values and includes a new appendix with over one hundred alpha factor examples. By the end, you will be proficient in translating ML model predictions into a trading strategy that operates at daily or intraday horizons, and in evaluating its performance.

Preface

What to expect

What's new in the second edition

Who should read this book

What this book covers

To get the most out of this book

Get in touch

Machine Learning for Trading – From Idea to Execution

The rise of ML in the investment industry

Designing and executing an ML-driven strategy

ML for trading – strategies and use cases

Summary

Free Chapter

Market and Fundamental Data – Sources and Techniques

Market data reflects its environment

Working with high-frequency data

API access to market data

How to work with fundamental data

Efficient data storage with pandas

Summary

Alternative Data for Finance – Categories and Use Cases

The alternative data revolution

Sources of alternative data

Criteria for evaluating alternative data

The market for alternative data

Working with alternative data

Summary

Financial Feature Engineering – How to Research Alpha Factors

Alpha factors in practice – from data to signals

Building on decades of factor research

Engineering alpha factors that predict returns

From signals to trades – Zipline for backtests

Separating signal from noise with Alphalens

Alpha factor resources

Summary

Portfolio Optimization and Performance Evaluation

How to measure portfolio performance

How to manage portfolio risk and return

Trading and managing portfolios with Zipline

Measuring backtest performance with pyfolio

Summary

The Machine Learning Process

How machine learning from data works

The machine learning workflow

Summary

Linear Models – From Risk Factors to Return Forecasts

From inference to prediction

The baseline model – multiple linear regression

How to run linear regression in practice

How to build a linear factor model

Regularizing linear regression using shrinkage

How to predict returns with linear regression

Linear classification

Summary

The ML4T Workflow – From Model to Strategy Backtesting

How to backtest an ML-driven strategy

Backtesting pitfalls and how to avoid them

How a backtesting engine works

backtrader – a flexible tool for local backtests

Zipline – scalable backtesting by Quantopian

Summary

Time-Series Models for Volatility Forecasts and Statistical Arbitrage

Tools for diagnostics and feature extraction

How to diagnose and achieve stationarity

Univariate time-series models

Multivariate time-series models

Cointegration – time series with a shared trend

Statistical arbitrage with cointegration

Summary

Bayesian ML – Dynamic Sharpe Ratios and Pairs Trading

How Bayesian machine learning works

Probabilistic programming with PyMC3

Bayesian ML for trading

Summary

Random Forests – A Long-Short Strategy for Japanese Stocks

Decision trees – learning rules from data

Random forests – making trees more reliable

Long-short signals for Japanese stocks

Summary

Boosting Your Trading Strategy

Getting started – adaptive boosting

Gradient boosting – ensembles for most tasks

Using XGBoost, LightGBM, and CatBoost

A long-short trading strategy with boosting

Boosting for an intraday strategy

Summary

Data-Driven Risk Factors and Asset Allocation with Unsupervised Learning

Dimensionality reduction

PCA for trading

Clustering

Hierarchical clustering for optimal portfolios

Summary

Text Data for Trading – Sentiment Analysis

ML with text data – from language to features

From text to tokens – the NLP pipeline

Counting tokens – the document-term matrix

NLP for trading

Summary

Topic Modeling – Summarizing Financial News

Learning latent topics – Goals and approaches

Probabilistic latent semantic analysis

Latent Dirichlet allocation

Modeling topics discussed in earnings calls

Topic modeling for with financial news

Summary

Word Embeddings for Earnings Calls and SEC Filings

How word embeddings encode semantics

How to use pretrained word vectors

Custom embeddings for financial news

word2vec for trading with SEC filings

Sentiment analysis using doc2vec embeddings

New frontiers – pretrained transformer models

Summary

Deep Learning for Trading

Deep learning – what's new and why it matters

Designing an NN

A neural network from scratch in Python

Popular deep learning libraries

Optimizing an NN for a long-short strategy

Summary

CNNs for Financial Time Series and Satellite Images

How CNNs learn to model grid-like data

CNNs for satellite images and object detection

CNNs for time-series data – predicting returns

Summary

RNNs for Multivariate Time Series and Sentiment Analysis

How recurrent neural nets work

RNNs for time series with TensorFlow 2

RNNs for text data

Summary

Autoencoders for Conditional Risk Factors and Asset Pricing

Autoencoders for nonlinear feature extraction

Implementing autoencoders with TensorFlow 2

A conditional autoencoder for trading

Summary

Generative Adversarial Networks for Synthetic Time-Series Data

Creating synthetic data with GANs

How to build a GAN using TensorFlow 2

TimeGAN for synthetic financial data

Summary

Deep Reinforcement Learning – Building a Trading Agent

Elements of a reinforcement learning system

How to solve reinforcement learning problems

Solving dynamic programming problems

Q-learning – finding an optimal policy on the go

Deep RL for trading with the OpenAI Gym

Summary

Conclusions and Next Steps

Key takeaways and lessons learned

ML for trading in practice

Conclusion

References

Index

Appendix: Alpha Factor Library

Common alpha factors implemented in TA-Lib

WorldQuant's quest for formulaic alphas

Bivariate and multivariate factor evaluation

Customer Reviews

5 star

4 star

3 star

2 star

1 star

To get the most out of this book

In addition to the content summarized in the previous section, the hands-on nature of the book consists of over 160 Jupyter notebooks hosted on GitHub that demonstrate the use of ML for trading in practice on a broad range of data sources. This section describes how to use the GitHub repository, obtain the data used in the numerous examples, and set up the environment to run the code.

The GitHub repository

The book revolves around the application of ML algorithms to trading. The hands-on aspects are covered in Jupyter notebooks, hosted on GitHub, that illustrate many of the concepts and models in more detail. While the chapters aim to be self-contained, the code examples and results often take up too much space to include in their complete forms. Therefore, it is very important to view the notebooks that contain significant additional content while reading the chapter, even if you do not intend to run the code yourself.

The repository is organized so that each chapter has its own directory containing the relevant notebooks and a README file containing separate instructions where needed, as well as references and resources specific to the chapter's content. The relevant notebooks are identified throughout each chapter, as necessary. The repository also contains instructions on how to install the requisite libraries and obtain the data.

You can find the code files placed at: https://github.com/PacktPublishing/Machine-Learning-for-Algorithmic-Trading-Second-Edition.

Data sources

We will use freely available historical data from market, fundamental, and alternative sources. Chapter 2 and Chapter 3 cover characteristics and access to these data sources and introduce key providers that we will use throughout the book. The companion GitHub repository just described contains instructions on how to obtain or create some of the datasets that we will use throughout and includes some smaller datasets.

A few sample data sources that we will source and work with include, but are not limited to:

Nasdaq ITCH order book data
Electronic Data Gathering, Analysis, and Retrieval (EDGAR) SEC filings
Earnings call transcripts from Seeking Alpha
Quandl daily prices and other data points for over 3,000 US stocks
International equity data from Stooq and using the yfinance library
Various macro fundamental and benchmark data from the Federal Reserve
Large Yelp business reviews and Twitter datasets
EUROSAT satellite image data

Some of the data is large (several gigabytes), such as Nasdaq and SEC filings. The notebooks indicate when that is the case.

See the data directory in the root folder of the GitHub repository for instructions.

Anaconda and Docker images

The book requires Python 3.7 or higher and uses the Anaconda distribution. The book uses various conda environments for the four parts to cover a broad range of libraries while limiting dependencies and conflicts.

The installation directory in the GitHub repository contains detailed instructions. You can either use the provided Docker image to create a container with the necessary environments or use the .yml files to create them locally.

Download the example code files

You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

Log in or register at http://www.packtpub.com.
Select the SUPPORT tab.
Click on Code Downloads & Errata.
Enter the name of the book in the Search box and follow the on-screen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of your preferred compression tool:

WinRAR or 7-Zip for Windows
Zipeg, iZip, or UnRarX for Mac
7-Zip or PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Machine-Learning-for-Algorithmic-Trading-Second-Edition. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781839217715_ColorImages.pdf.

Conventions used

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. For example, "The compute_factors() method creates a MeanReversion factor instance and creates long, short, and ranking pipeline columns."

A block of code is set as follows:

from pykalman import KalmanFilter
kf = KalmanFilter(transition_matrices = [1],
                  observation_matrices = [1],
                  initial_state_mean = 0,
                  initial_state_covariance = 1,
                  observation_covariance=1,
                  transition_covariance=.01)

Bold: Indicates a new term, an important word, or words that you see on the screen, for example, in menus or dialog boxes, also appear in the text like this. For example, "The Python Algorithmic Trading Library (PyAlgoTrade) focuses on backtesting and offers support for paper trading and live trading."

Informational notes appear like this.

Machine Learning for Algorithmic Trading - Second Edition

By : Stefan Jansen

Machine Learning for Algorithmic Trading - Second Edition

By: Stefan Jansen

Overview of this book

Related Content you might be interested in

Current Title:

Machine Learning for Algorithmic Trading - Second Edition

Python for Finance Cookbook

Python for Finance Cookbook

Hands-On Deep Learning for Finance

To get the most out of this book

The GitHub repository

Data sources

Anaconda and Docker images

Download the example code files

Download the color images

Conventions used