Dimson (1979) suggests the following method:
The most frequently used k value is 1. Thus, we have the next equation:
Before we run the regression based on the preceding equation, two functions called .diff()
and .shift()
are explained. Here, we randomly choose five prices. Then we estimate their price difference returns and add lag and forward returns:
import pandas as pd import scipy as sp price=[10,11,12.2,14.0,12] x=pd.DataFrame({'Price':price}) x['diff']=x.diff() x['Ret']=x['Price'].diff()/x['Price'].shift(1) x['RetLag']=x['Ret'].shift(1) x['RetLead']=x['Ret'].shift(-1) print(x)
The output is shown here:
Obviously, the price time series is assumed from the oldest to the newest. The difference is defined as p(i) – p(i-1). Thus, the first difference is NaN
, that is, a missing value. Let's look at period 4, that is, index=3
. The difference is 1.8 (14-12.2), return is (14-12.2)/12.2= 0.147541. The lag ret will be the return before this period...