A Pandas DataFrame
is a matrix and dictionary-like data structure similar to the functionality available in R. In fact, it is the central data structure in Pandas and you can apply all kinds of operations on it. It is quite common to have a look, for instance, at the correlation matrix of a portfolio. So let's do that.
First, we will create the
DataFrame
with Pandas for each symbol's daily log returns. Then we will join these on the date. At the end, the correlation will be printed, and plot will be shown.
Creating the data frame.
To create the data frame, we will create a dictionary containing stock symbols as keys, and the corresponding log returns as values. The data frame itself has the date as index and the stock symbols as column labels:
data = {} for i in xrange(len(symbols)): data[symbols[i]] = numpy.diff(numpy.log(close[i])) df = pandas.DataFrame(data, index=dates[0][:-1], columns=symbols)
Operating on the data frame...