Book Image

Machine Learning for Algorithmic Trading - Second Edition

By : Stefan Jansen
Book Image

Machine Learning for Algorithmic Trading - Second Edition

By: Stefan Jansen

Overview of this book

The explosive growth of digital data has boosted the demand for expertise in trading strategies that use machine learning (ML). This revised and expanded second edition enables you to build and evaluate sophisticated supervised, unsupervised, and reinforcement learning models. This book introduces end-to-end machine learning for the trading workflow, from the idea and feature engineering to model optimization, strategy design, and backtesting. It illustrates this by using examples ranging from linear models and tree-based ensembles to deep-learning techniques from cutting edge research. This edition shows how to work with market, fundamental, and alternative data, such as tick data, minute and daily bars, SEC filings, earnings call transcripts, financial news, or satellite images to generate tradeable signals. It illustrates how to engineer financial features or alpha factors that enable an ML model to predict returns from price data for US and international stocks and ETFs. It also shows how to assess the signal content of new features using Alphalens and SHAP values and includes a new appendix with over one hundred alpha factor examples. By the end, you will be proficient in translating ML model predictions into a trading strategy that operates at daily or intraday horizons, and in evaluating its performance.
Table of Contents (27 chapters)
24
References
25
Index

Market data reflects its environment

Market data is the product of how traders place orders for a financial instrument directly or through intermediaries on one of the numerous marketplaces, how they are processed, and how prices are set by matching demand and supply. As a result, the data reflects the institutional environment of trading venues, including the rules and regulations that govern orders, trade execution, and price formation. See Harris (2003) for a global overview and Jones (2018) for details on the U.S. market.

Algorithmic traders use algorithms, including ML, to analyze the flow of buy and sell orders and the resulting volume and price statistics to extract trade signals that capture insights into, for example, demand-supply dynamics or the behavior of certain market participants.

We will first review institutional features that impact the simulation of a trading strategy during a backtest before we start working with actual tick data created by one such environment, namely Nasdaq.

Market microstructure – the nuts and bolts

Market microstructure studies how the institutional environment affects the trading process and shapes outcomes like price discovery, bid-ask spreads and quotes, intraday trading behavior, and transaction costs (Madhavan 2000; 2002). It is one of the fastest-growing fields of financial research, propelled by the rapid development of algorithmic and electronic trading.

Today, hedge funds sponsor in-house analysts to track the rapidly evolving, complex details and ensure execution at the best possible market prices and design strategies that exploit market frictions. We will provide only a brief overview of these key concepts before we dive into the data generated by trading. The references contain several sources that treat this subject in great detail.

How to trade – different types of orders

Traders can place various types of buy or sell orders. Some orders guarantee immediate execution, while others may state a price threshold or other conditions that trigger execution. Orders are typically valid for the same trading day unless specified otherwise.

A market order is intended for immediate execution of the order upon arrival at the trading venue, at the price that prevails at that moment. In contrast, a limit order only executes if the market price is higher than the limit for a sell limit order, or lower than the limit for a buy limit order. A stop order, in turn, only becomes active when the market price rises above a specified price for a buy stop order, or falls below a specified price for a sell order. A buy stop order can be used to limit the losses of short sales. Stop orders may also have limits.

Numerous other conditions can be attached to orders. For example, all or none orders prevent partial execution; they are filled only if a specified number of shares is available and can be valid for a day or longer. They require special handling and are not visible to market participants. Fill or kill orders also prevent partial execution but cancel if not executed immediately. Immediate or cancel orders immediately buy or sell the number of shares that are available and cancel the remainder. Not-held orders allow the broker to decide on the time and price of execution. Finally, the market on open/close orders executes on or near the opening or closing of the market. Partial executions are allowed.

Where to trade – from exchanges to dark pools

Securities trade in highly organized and regulated exchanges or with varying degrees of formality in over-the-counter (OTC) markets. An exchange is a central marketplace where buyers and sellers compete for the lowest ask and highest bid, respectively. Exchange regulations typically impose listing and reporting requirements to create transparency and attract more traders and liquidity. OTC markets, such as the Best Market (OTCQX) or the Venture Market (OTCQB), often have lower regulatory barriers. As a result, they are suitable for a far broader range of securities, including bonds or American Depositary Receipts (ADRs; equity listed on a foreign exchange, for example, for Nestlé, S.A.).

Exchanges may rely on bilateral trading or centralized order-driven systems that match all buy and sell orders according to certain rules. Many exchanges use intermediaries that provide liquidity by making markets in certain securities. These intermediaries include dealers that act as principals on their own behalf and brokers that trade as agents on behalf of others. Price formation may occur through auctions, such as in the New York Stock Exchange (NYSE), where the highest bid and lowest offer are matched, or through dealers who buy from sellers and sell to buyers.

Back in the day, companies either registered and traded mostly on the NYSE, or they traded on OTC markets like Nasdaq. On the NYSE, a sole specialist intermediated trades of a given security. The specialist received buy and sell orders via a broker and tracked limit orders in a central order book. Limit orders were executed with a priority based on price and time. Buy market orders routed to the specialist transacted with the lowest ask (and sell market orders routed to the specialist transacted with the highest bid) in the limit order book, prioritizing earlier limit orders in the case of ties. Access to all orders in the central order book allowed the specialist to publish the best bid, ask prices, and set market prices based on the overall buy-sell imbalance.

On Nasdaq, multiple market makers facilitated stock trades. Each dealer provided their best bid and ask price to a central quotation system and stood ready to transact the specified number of shares at the specified prices. Traders would route their orders to the market maker with the best quote via their broker. The competition for orders made execution at fair prices very likely. Market makers ensured a fair and orderly market, provided liquidity, and disseminated prices like specialists but only had access to the orders routed to them as opposed to market-wide supply and demand. This fragmentation could create difficulties in identifying fair value market prices.

Today, trading has fragmented; instead of two principal venues in the US, there are more than thirteen displayed trading venues, including exchanges and (unregulated) alternative trading systems (ATSs) such as electronic communication networks (ECNs). Each reports trades to the consolidated tape, but at different latencies. To make matters more difficult, the rules of engagement for each venue differ with several different pricing and queuing models.

The following table lists some of the larger global exchanges and the trading volumes for the 12 months ending 03/2018 in various asset classes, including derivatives. Typically, a minority of financial instruments account for most trading:

Exchange

Stocks

Market cap (USD mn)

# Listed companies

Volume / day (USD mn)

# Shares / day ('000)

# Options / day ('000)

NYSE

23,138,626

2,294

78,410

6,122

1,546

Nasdaq — US

10,375,718

2,968

65,026

7,131

2,609

Japan Exchange Group Inc.

6,287,739

3,618

28,397

3,361

1

Shanghai Stock Exchange

5,022,691

1,421

34,736

9,801

Euronext

4,649,073

1,240

9,410

836

304

Hong Kong Exchanges and Clearing

4,443,082

2,186

12,031

1,174

516

LSE Group

3,986,413

2,622

10,398

1,011

Shenzhen Stock Exchange

3,547,312

2,110

40,244

14,443

Deutsche Boerse AG

2,339,092

506

7,825

475

BSE India Limited

2,298,179

5,439

602

1,105

National Stock Exchange of India Limited

2,273,286

1,952

5,092

10,355

BATS Global Markets - US

1,243

Chicago Board Options Exchange

1,811

International Securities Exchange

1,204

The ATSs mentioned previously include dozens of dark pools that allow traders to execute anonymously. They are estimated to account for 40 percent of all U.S. stock trades in 2017, compared with an estimated 16 percent in 2010. Dark pools emerged in the 1980s when the SEC allowed brokers to match buyers and sellers of big blocks of shares. The rise of high-frequency electronic trading and the 2007 SEC Order Protection rule that intended to spur competition and cut transaction costs through transparency as part of Regulation National Market System (Reg NMS) drove the growth of dark pools, as traders aimed to avoid the visibility of large trades (Mamudi 2017). Reg NMS also established the National Best Bid and Offer (NBBO) mandate for brokers to route orders to venues that offer the best price.

Some ATSs are called dark pools because they do not broadcast pre-trade data, including the presence, price, and amount of buy and sell orders as traditional exchanges are required to do. However, dark pools report information about trades to the Financial Industry Regulatory Authority (FINRA) after they occur. As a result, dark pools do not contribute to the process of price discovery until after trade execution but provide protection against various HFT strategies outlined in the first chapter.

In the next section, we will see how market data captures trading activity and reflect the institutional infrastructure in U.S. markets.