Book Image

Hands-On Machine Learning for Algorithmic Trading

By : Stefan Jansen
Book Image

Hands-On Machine Learning for Algorithmic Trading

By: Stefan Jansen

Overview of this book

The explosive growth of digital data has boosted the demand for expertise in trading strategies that use machine learning (ML). This book enables you to use a broad range of supervised and unsupervised algorithms to extract signals from a wide variety of data sources and create powerful investment strategies. This book shows how to access market, fundamental, and alternative data via API or web scraping and offers a framework to evaluate alternative data. You’ll practice the ML work?ow from model design, loss metric definition, and parameter tuning to performance evaluation in a time series context. You will understand ML algorithms such as Bayesian and ensemble methods and manifold learning, and will know how to train and tune these models using pandas, statsmodels, sklearn, PyMC3, xgboost, lightgbm, and catboost. This book also teaches you how to extract features from text data using spaCy, classify news and assign sentiment scores, and to use gensim to model topics and learn word embeddings from financial reports. You will also build and evaluate neural networks, including RNNs and CNNs, using Keras and PyTorch to exploit unstructured data for sophisticated strategies. Finally, you will apply transfer learning to satellite images to predict economic activity and use reinforcement learning to build agents that learn to trade in the OpenAI Gym.
Table of Contents (23 chapters)

Design and execution of a trading strategy

ML can add value at multiple steps in the lifecycle of a trading strategy, and relies on key infrastructure and data resources. Hence, this book aims to addresses how ML techniques fit into the broader process of designing, executing, and evaluating strategies.

An algorithmic trading strategy is driven by a combination of alpha factors that transform one or several data sources into signals that in turn predict future asset returns and trigger buy or sell orders. Chapter 2, Market and Fundamental Data and Chapter 3, Alternative Data for Finance cover the sourcing and management of data, the raw material and the single most important driver of a successful trading strategy.

Chapter 4, Alpha Factor Research outlines a methodologically sound process to manage the risk of false discoveries that increases with the amount of data. Chapter 5, Strategy Evaluation provides the context for the execution and performance measurement of a trading strategy:

Let's take a brief look at these steps, which we will discuss in depth in the following chapters.

Sourcing and managing data

The dramatic evolution of data in terms of volume, variety, and velocity is both a necessary condition for and driving force of the application of ML to algorithmic trading. The proliferating supply of data requires active management to uncover potential value, including the following steps:

  1. Identify and evaluate market, fundamental, and alternative data sources containing alpha signals that do not decay too quickly.
  2. Deploy or access cloud-based scalable data infrastructure and analytical tools like Hadoop or Spark Sourcing to facilitate fast, flexible data access
  3. Carefully manage and curate data to avoid look-ahead bias by adjusting it to the desired frequency on a point-in-time (PIT) basis. This means that data may only reflect information available and know at the given time. ML algorithms trained on distorted historical data will almost certainly fail during live trading.

Alpha factor research and evaluation

Alpha factors are designed to extract signals from data to predict asset returns for a given investment universe over the trading horizon. A factor takes on a single value for each asset when evaluated, but may combine one or several input variables. The process involves the steps outlined in the following figure:

The Research phase of the trading strategy workflow includes the design, evaluation, and combination of alpha factors. ML plays a large role in this process because the complexity of factors has increased as investors react to both the signal decay of simpler factors and the much richer data available today.

The development of predictive alpha factors requires the exploration of relationships between input data and the target returns, creative feature-engineering, and the testing and fine-tuning of data transformations to optimize the predictive power of the input.

The data transformations range from simple non-parametric rankings to complex ensemble models or deep neural networks, depending on the amount of signal in the inputs and the complexity of the relationship between the inputs and the target. Many of the simpler factors have emerged from academic research and have been increasingly widely used in the industry over the last several decades.

To minimize the risks of false discoveries due to data mining and because finance has been subject to decades of research that has resulted in several Nobel prizes, investors prefer to rely on factors that align with theories about financial markets and investor behavior. Laying out these theories is beyond the scope of this book, but the references will highlight avenues to dive deeper into this important framing aspect of algorithmic trading strategies.

To validate the signal content of an alpha factor candidate, it is necessary to obtain a robust estimate of its predictive power in environments representative of the market regime during which the factor would be used in a strategy. Reliable estimates require avoiding numerous methodological and practical pitfalls, including the use of data that induces survivorship or look-ahead biases by not reflecting realistic PIT information, or the failure to correct for bias due to multiple tests on the same data.

Signals derived from alpha factors are often individually weak, but sufficiently powerful when combined with other factors or data sources, for example, to modulate the signal as a function of the market or economic context.

Portfolio optimization and risk management

Alpha factors emit entry and exit signals that lead to buy or sell orders, and order execution results in portfolio holdings. The risk profiles of individual positions interact to create a specific portfolio risk profile. Portfolio management involves the optimization of position weights to achieve the desired portfolio risk and return a profile that aligns with the overall investment objectives. This process is highly dynamic to incorporate continuously-evolving market data.

The execution of trades during this process requires balancing the trader's dilemma: fast execution tends to drive up costs due to market impact, whereas slow execution may create implementation shortfall when the realized price deviates from the price that prevailed when the decision was taken. Risk management occurs throughout the portfolio-management process to adjust holdings or assume hedges, depending on observed or predicted changes in the market environment that impact the portfolio risk profile.

Strategy backtesting

The incorporation of an investment idea into an algorithmic strategy requires extensive testing with a scientific approach that attempts to reject the idea based on its performance in alternative out-of-sample market scenarios. Testing may involve simulated data to capture scenarios deemed possible but not reflected in historic data.

A strategy-backtesting engine needs to simulate the execution of a strategy realistically to achieve unbiased performance and risk estimates. In addition to the potential biases introduced by the data or a flawed use of statistics, the backtest engine needs to accurately represent the practical aspects of trade-signal evaluation, order placement, and execution in line with market conditions.