Book Image

Learn Algorithmic Trading

By : Sebastien Donadio, Sourav Ghosh
Book Image

Learn Algorithmic Trading

By: Sebastien Donadio, Sourav Ghosh

Overview of this book

It’s now harder than ever to get a significant edge over competitors in terms of speed and efficiency when it comes to algorithmic trading. Relying on sophisticated trading signals, predictive models and strategies can make all the difference. This book will guide you through these aspects, giving you insights into how modern electronic trading markets and participants operate. You’ll start with an introduction to algorithmic trading, along with setting up the environment required to perform the tasks in the book. You’ll explore the key components of an algorithmic trading business and aspects you’ll need to take into account before starting an automated trading project. Next, you’ll focus on designing, building and operating the components required for developing a practical and profitable algorithmic trading business. Later, you’ll learn how quantitative trading signals and strategies are developed, and also implement and analyze sophisticated trading strategies such as volatility strategies, economic release strategies, and statistical arbitrage. Finally, you’ll create a trading bot from scratch using the algorithms built in the previous sections. By the end of this book, you’ll be well-versed with electronic trading markets and have learned to implement, evaluate and safely operate algorithmic trading strategies in live markets.
Table of Contents (16 chapters)
Title Page

Why Python?

Python is the most widely used programming language in the world (one-third of new software development uses this language):

This language is very simple to learn. Python is an interpreted, high-level programming language with type inference. Unlike C/C++, where you need to focus on memory management and the hardware features of the machine you are using to code, Python takes care of the internal implementation, such as memory management. As a result, this type of language will ease the focus on coding trading algorithms. Python is versatile; it can be used in any domain for any application development. Since Python has been widely used for years, the community of programmers is large enough to get many critical libraries for your trading strategy, ranging from data analytics, machine learning, data extraction, and runtime to communication; the list of open source libraries is gigantic. Additionally, on the software engineering side, Python includes paradigms used in other languages, such as object-oriented, functional, and dynamic types. The online resources for Python are unlimited, and tons of book will drive you through any domains where you can use Python. Python is not the only language using in trading. We will preferably use Python (or eventually R) to do data analysis and to create trading models. We will use C, C++, or Java in trading for production code. These language will compile source code into executable or byte codes. Consequently, the software will be one hundred times faster than Python or R. Even if these three last languages are faster than Python, we will use all of them to create libraries. We will wrap these libraries to be used with Python (or R).

When choosing Python, we also need to choose the version of the language. While Python 2 is the most commonly used Python standard, Python 3 should take over in a few years. The Python community develops Python 3 libraries. Tech firms have started their migration toward this version. After 2020, Python 2.X will no longer be maintained. Therefore, if you are a new programmer, it is recommended to learn Python 3 over Python 2. 

Both Python and R are among the most popular languages for assisting quantitative researchers (or quantitative developers) in creating trading algorithms. It provides a ton of support libraries for data analysis or machine learning. Choosing between these two languages will depend on which side of the community you are on. We always associate Python with a general-purpose language with an understandable syntax and simplicity, while R was developed with statisticians as an end user by giving emphasis to data visualization. Even if Python can also give you the same visualization experience, R was designed for this purpose.

R is not significantly more recent than Python. It was released in 1995 by the two founders, Ross Ihaka and Robert Gentleman, while Python was released in 1991 by Guido Van Rossum. Today, R is mainly used by the academic and research world.

Unlike many other languages, Python and R allows us to write a statistical model with a few lines of code. Because it is impossible to choose one over the other, since they both have their own advantages, they can easily be used in a complementary manner. Developers created a multitude of libraries capable of easily using one language in conjunction with the other without any difficulties.

Choice of IDE – Pycharm or Notebook

While RStudio became the standard IDE (Integrated Development Environment) for R, choosing between JetBrains PyCharm and Jupyter Notebook is much more challenging. To begin with, we need to talk about the features of these two different IDEs. PyCharm was developed by the Czech company JetBrains, and is a text editor providing code analysis, a graphical debugger, and an advanced unit tester. Jupyter Notebook is a non-profit organization that created a web-based interactive computational environment for the following three languages: Julia, Python, and R. This software helps you to code Python by giving you a web-based interface where you will run the Python code line by line.

The major difference between these two IDEs is that PyCharm became a reference IDE among programmers, since the version control system and the debugger are an important part of this product. Additionally, PyCharm can easily handle a large code base and has a ton of plugins.

Jupyter Notebook is a friendly choice when data analysis is the only motivation, while PyCharm doesn't have the same user-friendly interface to run code line by line for data analytics. The features that PyCharm provides are the most frequently used in the Python programming world.

Our first algorithmic trading (buy when the price is low, and sell when the price is high)

You may now feel that you are impatient to make money, and you may also be thinking When can you start doing so?

We have talked about what we will address in this book. In this section, we will start building our first trading strategy, called buy low, sell high.

Building a trading strategy takes time and goes through numerous steps:

  1. You need an original idea. This part will use a well-known money-making strategy: we buy an asset with a price lower than the one we will use to sell it. For the purpose of illustrating this idea, we will be using Google stock.
  2. Once we get the idea, we need data to validate the idea. In Python, there are many packages that we can use, to get trading data.
  3. You will then need to use a large amount of historical data to backtest your trading strategy assuming this rule: what worked in the past will work in the future.

Setting up your workspace

PyCharm 101

Once PyCharm is loaded, you will need to create a project and choose an interpreter. As we previously discussed, you will need to choose a version of Python 3. At the time of writing this book, the most up-to-date version is Python 3.7.0, but feel free to start with a more recent version than this one. Once the project is open, you need to create a Python file that you will call buylowsellhigh.py. This file will contain the code of your first Python implementation.

Getting the data

Many libraries can help download financial data; our choice though is to use the pandas library. This software Python library is well known for data manipulation and analysis. We will use the DataReader function, which is capable of connecting to a financial news server such as Yahoo, Google, and many others, and then downloading the data that you will need for the example of this book. DataReader takes four arguments in this example:

  1. The first one is the symbol (our example uses GOOG for Google) you would like to use for analysis.
  2. The second specifies the source for retrieving the data, and then you will specify the range of days to get the data.
  3. The third specifies the starting data from which to fetch historical data.
  4. The fourth and final argument specifies the end data for the historical data series:
# loading the class data from the package pandas_datareader
from pandas_datareader import data
# First day
start_date = '2014-01-01'
# Last day
end_date = '2018-01-01'
# Call the function DataReader from the class data
goog_data = data.DataReader('GOOG', 'yahoo', start_date, end_date)

The goog_data variable is the data frame containing the Google data from January 1, 2014 to January 1, 2018. If you print the goog_data variable, you will see the following:

print(goog_data)
High Low ... Volume Adj Close
Date ...
2010-01-04 312.721039 310.103088 ... 3937800.0 311.349976
2010-01-05 311.891449 308.761810 ... 6048500.0 309.978882
2010-01-06 310.907837 301.220856 ... 8009000.0 302.164703
2010-01-07 303.029083 294.410156 ... 12912000.0 295.130463

If you would like to see all the columns, you should change the option of the pandas library by allowing more than four displayed columns:

import pandas as pd
pd.set_printoptions(max_colwidth, 1000)
pd.set_option('display.width', 1000)
High Low Open Close Volume Adj Close
Date
2010-01-04 312.721039 310.103088 311.449310 311.349976 3937800.0 311.349976
2010-01-05 311.891449 308.761810 311.563568 309.978882 6048500.0 309.978882
2010-01-06 310.907837 301.220856 310.907837 302.164703 8009000.0 302.164703
2010-01-07 303.029083 294.410156 302.731018 295.130463 12912000.0 295.130463

As per the previous output, there are six columns:

  • High: The highest price of the stock on that trading day.
  • Low: The lowest price of the stock on that trading day.
  • Close: The price of the stock at closing time.
  • Open: The price of the stock at the beginning of the trading day (closing price of the previous trading day).
  • Volume: How many stocks were traded.
  • Adj Close: The closing price of the stock that adjusts the price of the stock for corporate actions. This price takes into account the stock splits and dividends.

The adjusted close is the price we will use for this example. Indeed, since it takes into account splits and dividends, we will not need to adjust the price manually.

Preparing the data – signal

The main part of a trading strategy (or a trading algorithm) is to decide when to trade (either to buy or sell a security or other asset). The event triggering the sending of an order is called a signal. A signal can use a large variety of inputs. These inputs may be market information, news, or a social networking website. Any combination of data can be a signal.

From the section entitled Our first algorithmic trading (buy when the price is low, and sell when the price is high), for the buy low sell high example, we will calculate the difference in the adjusted close between two consecutive days. If the value of the adjusted close is negative, this means the price on the previous day was higher than the price the following day, so we can buy since the price is lower now. If this value is positive, this means that we can sell because the price is higher.

In Python, we are building a pandas data frame getting the same dimension as the data frame containing the data. This data frame will be called goog_data_signal:

goog_data_signal = pd.DataFrame(index=goog_data.index)

Following the creation of this data frame, we will copy the data we will use to build our signal to trade. In this case, we will copy the values of the Adj Close column from the goog_data data frame:

goog_data_signal['price'] = goog_data['Adj Close']

Based on our trading strategy, we need to have a column, daily_difference, to store the difference between two consecutive days. In order to create this column, we will use the diff function from the data frame object:

goog_data_signal['daily_difference'] = goog_data_signal['price'].diff()

As a sanity check, we can use the print function to display what goog_data_signal contains:

print(goog_data_signal.head())
price daily_difference
Date
2014-01-02 552.963501 NaN
2014-01-03 548.929749 -4.033752
2014-01-06 555.049927 6.120178
2014-01-07 565.750366 10.700439
2014-01-08 566.927673 1.177307

We can observe that the daily_difference column has a non-numerical value for January 2, since it is the first row in this data frame.

We will create the signal based on the values of the column, daily_difference. If the value is positive, we will give the value 1, otherwise, the value will remain 0:

goog_data_signal['signal'] = 0.0
goog_data_signal['signal'] = np.where(goog_data_signal['daily_difference'] >; 0, 1.0, 0.0)
price daily_difference signal
Date
2014-01-02 552.963501 NaN 0.0
2014-01-03 548.929749 -4.033752 0.0
2014-01-06 555.049927 6.120178 1.0
2014-01-07 565.750366 10.700439 1.0
2014-01-08 566.927673 1.177307 1.0

Reading the column signal, we have 0 when we need to buy, and we have 1 when we need to sell.

Since we don't want to constantly buy if the market keeps moving down, or constantly sell when the market is moving up, we will limit the number of orders by restricting ourselves to the number of positions on the market. The position is your inventory of stocks or assets that you have on the market. For instance, if you buy one Google share, this means you have a position of one share on the market. If you sell this share, you will not have any positions on the market.

To simplify our example and limit the position on the market, it will be impossible to buy or sell more than one time consecutively. Therefore, we will apply diff() to the column signal:

goog_data_signal['positions'] = goog_data_signal['signal'].diff()
price daily_difference signal positions
Date
2014-01-02 552.963501 NaN 0.0 NaN
2014-01-03 548.929749 -4.033752 0.0 0.0
2014-01-06 555.049927 6.120178 1.0 1.0
2014-01-07 565.750366 10.700439 1.0 0.0
2014-01-08 566.927673 1.177307 1.0 0.0
2014-01-09 561.468201 -5.459473 0.0 -1.0
2014-01-10 561.438354 -0.029846 0.0 0.0
2014-01-13 557.861633 -3.576721 0.0 0.0

We will buy a share of Google on January 6 for a price of 555.049927, and then sell this share for a price of 561.468201. The profit of this trade is 561.468201-555.049927=6.418274.

Signal visualization

While creating signals is just the beginning of the process of building a trading strategy, we need to visualize how the strategy performs in the long term. We will plot the graph of the historical data we used by using the matplotlib library. This library is well known in the Python world for making it easy to plot charts:

  1. We will start by importing this library:
import matplotlib.pyplot as plt
  1. Next, we will define a figure that will contain our chart:
fig = plt.figure()
ax1 = fig.add_subplot(111, ylabel='Google price in $')
  1. Now, we will plot the price within the range of days we initially chose:
goog_data_signal['price'].plot(ax=ax1, color='r', lw=2.)
  1. Next, we will draw an up arrow when we buy one Google share:
ax1.plot(goog_data_signal.loc[goog_data_signal.positions == 1.0].index,
goog_data_signal.price[goog_data_signal.positions == 1.0],
'^', markersize=5, color='m')
  1. Next, we will draw a down arrow when we sell one Google share:
ax1.plot(goog_data_signal.loc[goog_data_signal.positions == -1.0].index,
goog_data_signal.price[goog_data_signal.positions == -1.0],
'v', markersize=5, color='k')
plt.show()

This code will return the following output. Let's have a look at the following plot:

Up to this point, we introduced the trading idea, we implemented the signal triggering buy and sell orders, and we talked about the way of restricting the strategy by limiting the position to one share on the market. Once these steps are satisfactory, the following step is backtesting.

Backtesting

Backtesting is a key phase to get statistics showing how effective the trading strategy is. As we previously learned, the backtesting relies on the assumption that the past predicts the future. This phase will provide the statistics that you or your company consider important, such as the following:

  • Profit and loss (P and L): The money made by the strategy without transaction fees.
  • Net profit and loss (net P and L): The money made by the strategy with transaction fees.
  • Exposure: The capital invested.
  • Number of trades: The number of trades placed during a trading session.
  • Annualized return: This is the return for a year of trading.
  • Sharpe ratio: The risk-adjusted return. This date is important because it compares the return of the strategy with a risk-free strategy.

While this part will be described in detail later, for this section, we will be interested in testing our strategy with an initial capital over a given period of time.

For the purpose of backtesting, we will have a portfolio (grouping of financial assets such as bonds and stocks) composed of only one type of stock: Google (GOOG). We will start this portfolio with $1,000:

initial_capital = float(1000.0)

Now, we will create a data frame for the positions and the portfolio:

positions = pd.DataFrame(index=goog_data_signal.index).fillna(0.0)
portfolio = pd.DataFrame(index=goog_data_signal.index).fillna(0.0)

Next, we will store the GOOG positions in the following data frame:

positions['GOOG'] = goog_data_signal['signal']

Then, we will store the amount of the GOOG positions for the portfolio in this one:

portfolio['positions'] = (positions.multiply(goog_data_signal['price'], axis=0))

Next, we will calculate the non-invested money (cash):

portfolio['cash'] = initial_capital - (positions.diff().multiply(goog_data_signal['price'], axis=0)).cumsum()

The total investment will be calculated by summing the positions and the cash:

portfolio['total'] = portfolio['positions'] + portfolio['cash']

When we draw the following plot, we can easily establish that our strategy is profitable:

When we create a trading strategy, we have an initial amount of money (cash). We will invest this money (holdings). This holding value is based on the market value of the investment. If we own a stock and the price of this stock increases, the value of the holding will increase. When we decide to sell, we move the value of the holding corresponding to this sale to the cash amount. The sum total of the assets is the sum of the cash and the holdings. The preceding chart shows that the strategy is profitable since the amount of cash increases toward the end. The graph allows you to check whether your trading idea can generate money.