Hands-On Machine Learning for Algorithmic Trading

By : Stefan Jansen

Hands-On Machine Learning for Algorithmic Trading

By: Stefan Jansen

Overview of this book

The explosive growth of digital data has boosted the demand for expertise in trading strategies that use machine learning (ML). This book enables you to use a broad range of supervised and unsupervised algorithms to extract signals from a wide variety of data sources and create powerful investment strategies. This book shows how to access market, fundamental, and alternative data via API or web scraping and offers a framework to evaluate alternative data. You’ll practice the ML work?ow from model design, loss metric definition, and parameter tuning to performance evaluation in a time series context. You will understand ML algorithms such as Bayesian and ensemble methods and manifold learning, and will know how to train and tune these models using pandas, statsmodels, sklearn, PyMC3, xgboost, lightgbm, and catboost. This book also teaches you how to extract features from text data using spaCy, classify news and assign sentiment scores, and to use gensim to model topics and learn word embeddings from financial reports. You will also build and evaluate neural networks, including RNNs and CNNs, using Keras and PyTorch to exploit unstructured data for sophisticated strategies. Finally, you will apply transfer learning to satellite images to predict economic activity and use reinforcement learning to build agents that learn to trade in the OpenAI Gym.

Preface

Who this book is for

What this book covers

To get the most out of this book

Get in touch

Free Chapter

Machine Learning for Trading

How to read this book

The rise of ML in the investment industry

Design and execution of a trading strategy

ML and algorithmic trading strategies

Summary

Market and Fundamental Data

How to work with market data

How to work with fundamental data

Efficient data storage with pandas

Summary

Alternative Data for Finance

The alternative data revolution

Evaluating alternative datasets

The market for alternative data

Working with alternative data

Summary

Alpha Factor Research

Engineering alpha factors

Seeking signals – how to use zipline

Separating signal and noise – how to use alphalens

Alpha factor resources

Summary

Strategy Evaluation

How to build and test a portfolio with zipline

How to measure performance with pyfolio

How to avoid the pitfalls of backtesting

How to manage portfolio risk and return

Summary

The Machine Learning Process

Learning from data

The machine learning workflow

Summary

Linear Models

Linear regression for inference and prediction

The multiple linear regression model

How to build a linear factor model

Shrinkage methods – regularization for linear regression

How to use linear regression to predict returns

Linear classification

Summary

Time Series Models

Analytical tools for diagnostics and feature extraction

Univariate time series models

Multivariate time series models

Summary

Bayesian Machine Learning

How Bayesian machine learning works

Probabilistic programming with PyMC3

Summary

Decision Trees and Random Forests

Decision trees

Random forests

Summary

Gradient Boosting Machines

Adaptive boosting

Gradient boosting machines

Fast scalable GBM implementations

How to interpret GBM results

Summary

Unsupervised Learning

Dimensionality reduction

Clustering

Summary

Working with Text Data

How to extract features from text data

From text to tokens – the NLP pipeline

From tokens to numbers – the document-term matrix

Text classification and sentiment analysis

Summary

Topic Modeling

Learning latent topics: goals and approaches

Latent semantic indexing

Probabilistic latent semantic analysis

Latent Dirichlet allocation

Summary

Word Embeddings

How word embeddings encode semantics

Word vectors from SEC filings using gensim

Sentiment analysis with Doc2vec

Bonus – Word2vec for translation

Summary

Deep Learning

Deep learning and AI

How to design a neural network

How to build a neural network using Python

How to train a neural network

How to use DL libraries

How to optimize neural network architectures

Summary

Convolutional Neural Networks

How ConvNets work

How to design and train a CNN using Python

Transfer learning – faster training with less data

How to detect objects

Recent developments

Summary

Recurrent Neural Networks

How RNNs work

How to build and train RNNs using Python

Summary

Autoencoders and Generative Adversarial Nets

How autoencoders work

Designing and training autoencoders using Python

How GANs work

Summary

Reinforcement Learning

Key elements of RL

How to solve RL problems

Dynamic programming – Value and Policy iteration

Q-learning

Deep reinforcement learning

Reinforcement learning for trading

Summary

Next Steps

Key takeaways and lessons learned

ML for trading in practice

Conclusion

Other Books You May Enjoy

Leave a review - let other readers know what you think

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Design and execution of a trading strategy

ML can add value at multiple steps in the lifecycle of a trading strategy, and relies on key infrastructure and data resources. Hence, this book aims to addresses how ML techniques fit into the broader process of designing, executing, and evaluating strategies.

An algorithmic trading strategy is driven by a combination of alpha factors that transform one or several data sources into signals that in turn predict future asset returns and trigger buy or sell orders. Chapter 2, Market and Fundamental Data and Chapter 3, Alternative Data for Finance cover the sourcing and management of data, the raw material and the single most important driver of a successful trading strategy.

Chapter 4, Alpha Factor Research outlines a methodologically sound process to manage the risk of false discoveries that increases with the amount of data. Chapter 5, Strategy Evaluation provides the context for the execution and performance measurement of a trading strategy:

Let's take a brief look at these steps, which we will discuss in depth in the following chapters.

Sourcing and managing data

The dramatic evolution of data in terms of volume, variety, and velocity is both a necessary condition for and driving force of the application of ML to algorithmic trading. The proliferating supply of data requires active management to uncover potential value, including the following steps:

Identify and evaluate market, fundamental, and alternative data sources containing alpha signals that do not decay too quickly.
Deploy or access cloud-based scalable data infrastructure and analytical tools like Hadoop or Spark Sourcing to facilitate fast, flexible data access
Carefully manage and curate data to avoid look-ahead bias by adjusting it to the desired frequency on a point-in-time (PIT) basis. This means that data may only reflect information available and know at the given time. ML algorithms trained on distorted historical data will almost certainly fail during live trading.

Alpha factor research and evaluation

Alpha factors are designed to extract signals from data to predict asset returns for a given investment universe over the trading horizon. A factor takes on a single value for each asset when evaluated, but may combine one or several input variables. The process involves the steps outlined in the following figure:

The Research phase of the trading strategy workflow includes the design, evaluation, and combination of alpha factors. ML plays a large role in this process because the complexity of factors has increased as investors react to both the signal decay of simpler factors and the much richer data available today.

The development of predictive alpha factors requires the exploration of relationships between input data and the target returns, creative feature-engineering, and the testing and fine-tuning of data transformations to optimize the predictive power of the input.

The data transformations range from simple non-parametric rankings to complex ensemble models or deep neural networks, depending on the amount of signal in the inputs and the complexity of the relationship between the inputs and the target. Many of the simpler factors have emerged from academic research and have been increasingly widely used in the industry over the last several decades.

To minimize the risks of false discoveries due to data mining and because finance has been subject to decades of research that has resulted in several Nobel prizes, investors prefer to rely on factors that align with theories about financial markets and investor behavior. Laying out these theories is beyond the scope of this book, but the references will highlight avenues to dive deeper into this important framing aspect of algorithmic trading strategies.

To validate the signal content of an alpha factor candidate, it is necessary to obtain a robust estimate of its predictive power in environments representative of the market regime during which the factor would be used in a strategy. Reliable estimates require avoiding numerous methodological and practical pitfalls, including the use of data that induces survivorship or look-ahead biases by not reflecting realistic PIT information, or the failure to correct for bias due to multiple tests on the same data.

Signals derived from alpha factors are often individually weak, but sufficiently powerful when combined with other factors or data sources, for example, to modulate the signal as a function of the market or economic context.

Portfolio optimization and risk management

Alpha factors emit entry and exit signals that lead to buy or sell orders, and order execution results in portfolio holdings. The risk profiles of individual positions interact to create a specific portfolio risk profile. Portfolio management involves the optimization of position weights to achieve the desired portfolio risk and return a profile that aligns with the overall investment objectives. This process is highly dynamic to incorporate continuously-evolving market data.

The execution of trades during this process requires balancing the trader's dilemma: fast execution tends to drive up costs due to market impact, whereas slow execution may create implementation shortfall when the realized price deviates from the price that prevailed when the decision was taken. Risk management occurs throughout the portfolio-management process to adjust holdings or assume hedges, depending on observed or predicted changes in the market environment that impact the portfolio risk profile.

Strategy backtesting

The incorporation of an investment idea into an algorithmic strategy requires extensive testing with a scientific approach that attempts to reject the idea based on its performance in alternative out-of-sample market scenarios. Testing may involve simulated data to capture scenarios deemed possible but not reflected in historic data.

A strategy-backtesting engine needs to simulate the execution of a strategy realistically to achieve unbiased performance and risk estimates. In addition to the potential biases introduced by the data or a flawed use of statistics, the backtest engine needs to accurately represent the practical aspects of trade-signal evaluation, order placement, and execution in line with market conditions.

Hands-On Machine Learning for Algorithmic Trading

By : Stefan Jansen

Hands-On Machine Learning for Algorithmic Trading

By: Stefan Jansen

Overview of this book

Related Content you might be interested in

Current Title:

Hands-On Machine Learning for Algorithmic Trading

Python for Finance Cookbook

Hands-On Deep Learning for Finance

Python for Finance Cookbook