Book Image

Python for Finance - Second Edition

By : Yuxing Yan
5 (1)
Book Image

Python for Finance - Second Edition

5 (1)
By: Yuxing Yan

Overview of this book

This book uses Python as its computational tool. Since Python is free, any school or organization can download and use it. This book is organized according to various finance subjects. In other words, the first edition focuses more on Python, while the second edition is truly trying to apply Python to finance. The book starts by explaining topics exclusively related to Python. Then we deal with critical parts of Python, explaining concepts such as time value of money stock and bond evaluations, capital asset pricing model, multi-factor models, time series analysis, portfolio theory, options and futures. This book will help us to learn or review the basics of quantitative finance and apply Python to solve various problems, such as estimating IBM’s market risk, running a Fama-French 3-factor, 5-factor, or Fama-French-Carhart 4 factor model, estimating the VaR of a 5-stock portfolio, estimating the optimal portfolio, and constructing the efficient frontier for a 20-stock portfolio with real-world stock, and with Monte Carlo Simulation. Later, we will also learn how to replicate the famous Black-Scholes-Merton option model and how to price exotic options such as the average price call option.
Table of Contents (23 chapters)
Python for Finance Second Edition
Credits
About the Author
About the Reviewers
www.PacktPub.com
Customer Feedback
Preface
Index

Data manipulation


There are many different types of data, such as integer, real number, or string. The following table offers a list of those data types:

Data types

Description

Bool

Boolean (TRUE or FALSE) stored as a byte

Int

Platform integer (normally either int32 or int64)

int8

Byte (-128 to 127)

int16

Integer (-32768 to 32767)

int32

Integer (-2147483648 to 2147483647)

int64

Integer (9223372036854775808 to 9223372036854775807)

unit8

Unsigned integer (0 to 255)

unit16

Unsigned integer (0 to 65535)

unit32

Unsigned integer (0 to 4294967295)

unit64

Unsigned integer (0 to 18446744073709551615)

float

Short and for float6

float32

Single precision float: sign bit23 bits mantissa; 8 bits exponent

float64

52 bits mantissa

complex

Shorthand for complex128

complex64

Complex number; represented by two 32-bit floats (real and imaginary components)

complex128

Complex number; represented by two 64-bit floats (real and imaginary components)

Table 1.1 List of different data types

In the following examples, we assign a value to r, which is a scalar, and several values to pv, which is an array (vector).The type() function is used to show their types:

>>> import numpy as np
>>> r=0.023
>>>pv=np.array([100,300,500])
>>>type(r)
<class'float'>
>>>type(pv)
<class'numpy.ndarray'>

To choose the appropriate decision, we use the round()function; see the following example:

>>> 7/3
2.3333333333333335
>>>round(7/3,5)
2.33333
>>>

For data manipulation, let's look at some simple operations:

>>>import numpy as np
>>>a=np.zeros(10)                      # array with 10 zeros 
>>>b=np.zeros((3,2),dtype=float)       # 3 by 2 with zeros 
>>>c=np.ones((4,3),float)              # 4 by 3 with all ones 
>>>d=np.array(range(10),float)         # 0,1, 2,3 .. up to 9 
>>>e1=np.identity(4)                   # identity 4 by 4 matrix 
>>>e2=np.eye(4)                        # same as above 
>>>e3=np.eye(4,k=1)                    # 1 start from k 
>>>f=np.arange(1,20,3,float)           # from 1 to 19 interval 3 
>>>g=np.array([[2,2,2],[3,3,3]])       # 2 by 3 
>>>h=np.zeros_like(g)                  # all zeros 
>>>i=np.ones_like(g)                   # all ones

Some so-called dot functions are quite handy and useful:

>>> import numpy as np
>>> x=np.array([10,20,30])
>>>x.sum()
60

Anything after the number sign of # will be a comment. Arrays are another important data type:

>>>import numpy as np
>>>x=np.array([[1,2],[5,6],[7,9]])      # a 3 by 2 array
>>>y=x.flatten()
>>>x2=np.reshape(y,[2,3]              ) # a 2 by 3 array

We could assign a string to a variable:

>>> t="This is great"
>>>t.upper()
'THIS IS GREAT'
>>>

To find out all string-related functions, we use dir(''); see the following code:

>>>dir('')
['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']
>>>

For example, from the preceding list we see a function called split. After typinghelp(''.split), we will have related help information:

>>>help(''.split)
Help on built-in function split:

split(...) method of builtins.str instance
S.split(sep=None, maxsplit=-1) -> list of strings

    Return a list of the words in S, using sep as the
delimiter string. If maxsplit is given, at most maxsplit
splits are done. If sep is not specified or is None, any
whitespace string is a separator and empty strings are
removed from the result.
>>>

We could try the following example:

>>> x="this is great"
>>>x.split()
['this', 'is', 'great']
>>>

Matrix manipulation is important when we deal with various matrices:

The condition for equation (3) is that matrices A and B should have the same dimensions. For the product of two matrices, we have the following equation:

Here,A is an n by k matrix (n rows and k columns), while B is a k by m matrix. Remember that the second dimension of the first matrix should be the same as the first dimension of the second matrix. In this case, it is k. If we assume that the individual data items in C, A, and B are Ci,j (the ith row and the jth column), Ai,j, and Bi,j, we have the following relationship between them:

The dot() function from the NumPy module could be used to carry the preceding matrix multiplication:

>>>a=np.array([[1,2,3],[4,5,6]],float)    # 2 by 3
>>>b=np.array([[1,2],[3,3],[4,5]],float)  # 3 by 2
>>>np.dot(a,b)                            # 2 by 2
>>>print(np.dot(a,b))
array([[ 19.,  23.],
[ 43.,  53.]])
>>>

We could manually calculate c(1,1): 1*1 + 2*3 + 3*4=19.

After retrieving data or downloading data from the internet, we need to process it. Such a skill to process various types of raw data is vital to finance students and to professionals working in the finance industry. Here we will see how to download price data and then estimate returns.

Assume that we have n values of x1, x2, … and xn. There exist two types of means: arithmetic mean and geometric mean; see their genetic definitions here:

Assume that there exist three values of 2,3, and 4. Their arithmetic and geometric means are calculated here:

>>>(2+3+4)/3.
>>>3.0
>>>geo_mean=(2*3*4)**(1./3)
>>>round(geo_mean,4) 
2.8845

For returns, the arithmetic mean's definition remains the same, while the geometric mean of returns is defined differently; see the following equations:

In Chapter 3, Time Value of Money, we will discuss both means again.

We could say that NumPy is a basic module while SciPy is a more advanced one. NumPy tries to retain all features supported by either of its predecessors, while most new features belong in SciPy rather than NumPy. On the other hand, NumPy and SciPy have many overlapping features in terms of functions for finance. For those two types of definitions, see the following example:

>>> import scipy as sp
>>> ret=sp.array([0.1,0.05,-0.02])
>>>sp.mean(ret)
0.043333333333333342
>>>pow(sp.prod(ret+1),1./len(ret))-1 
0.042163887067679262

Our second example is related to processing theFama-French 3 factor time series. Since this example is more complex than the previous one, if a user feels it is difficult to understand, he/she could simply skip this example. First, a ZIP file called F-F_Research_Data_Factor_TXT.zip could be downloaded from Prof. French's Data Library. After unzipping and removing the first few lines and annual datasets, we will have a monthly Fama-French factor time series. The first few lines and last few lines are shown here:

DATE    MKT_RFSMBHMLRF
192607    2.96   -2.30   -2.87    0.22
192608    2.64   -1.40    4.19    0.25
192609    0.36   -1.32    0.01    0.23

201607    3.95    2.90   -0.98    0.02
201608    0.49    0.94    3.18    0.02
201609    0.25    2.00   -1.34    0.02

Assume that the final file is called ffMonthly.txt under c:/temp/. The following program is used to retrieve and process the data:

import numpy as np
import pandas as pd
file=open("c:/temp/ffMonthly.txt","r")
data=file.readlines()
f=[]
index=[]
for i in range(1,np.size(data)):
    t=data[i].split()
    index.append(int(t[0]))
    for j in range(1,5):
        k=float(t[j])
        f.append(k/100)
n=len(f) 
f1=np.reshape(f,[n/4,4])
ff=pd.DataFrame(f1,index=index,columns=['Mkt_Rf','SMB','HML','Rf'])

To view the first and last few observations for the dataset called ff, the functions of .head() and .tail()can be used: