Book Image

Learning Quantitative Finance with R

By : Dr. Param Jeet, PRASHANT VATS
Book Image

Learning Quantitative Finance with R

By: Dr. Param Jeet, PRASHANT VATS

Overview of this book

The role of a quantitative analyst is very challenging, yet lucrative, so there is a lot of competition for the role in top-tier organizations and investment banks. This book is your go-to resource if you want to equip yourself with the skills required to tackle any real-world problem in quantitative finance using the popular R programming language. You'll start by getting an understanding of the basics of R and its relevance in the field of quantitative finance. Once you've built this foundation, we'll dive into the practicalities of building financial models in R. This will help you have a fair understanding of the topics as well as their implementation, as the authors have presented some use cases along with examples that are easy to understand and correlate. We'll also look at risk management and optimization techniques for algorithmic trading. Finally, the book will explain some advanced concepts, such as trading using machine learning, optimizations, exotic options, and hedging. By the end of this book, you will have a firm grasp of the techniques required to implement basic quantitative finance models in R.
Table of Contents (16 chapters)
Learning Quantitative Finance with R
Credits
About the Authors
About the Reviewer
www.PacktPub.com
Customer Feedback
Preface

Importing and exporting different data types


In R, we can read the files stored from outside the R environment. We can also write the data into files which can be stored and accessed by the operating system. In R, we can read and write different formats of files, such as CSV, Excel, TXT, and so on. In this section, we are going to discuss how to read and write different formats of files.

The required files should be present in the current directory to read them. Otherwise, the directory should be changed to the required destination.

The first step for reading/writing files is to know the working directory. You can find the path of the working directory by running the following code:

>print (getwd()) 

This will give the paths for the current working directory. If it is not your desired directory, then please set your own desired directory by using the following code:

>setwd("") 

For instance, the following code makes the folder C:/Users the working directory:

  >setwd("C:/Users") 

How to read and write a CSV format file

A CSV format file is a text file in which values are comma separated. Let us consider a CSV file with the following content from stock-market data:

Date

Open

High

Low

Close

Volume

Adj Close

14-10-2016

2139.68

2149.19

2132.98

2132.98

3.23E+09

2132.98

13-10-2016

2130.26

2138.19

2114.72

2132.55

3.58E+09

2132.55

12-10-2016

2137.67

2145.36

2132.77

2139.18

2.98E+09

2139.18

11-10-2016

2161.35

2161.56

2128.84

2136.73

3.44E+09

2136.73

10-10-2016

2160.39

2169.6

2160.39

2163.66

2.92E+09

2163.66

To read the preceding file in R, first save this file in the working directory, and then read it (the name of the file is Sample.csv) using the following code:

>data<-read.csv("Sample.csv") 
>print(data) 

When the preceding code gets executed, it will give the following output:

       Date    Open    High     Low   Close     Volume     Adj.Close 
1  14-10-2016 2139.68 2149.19 2132.98 2132.98 3228150000   2132.98 
2  13-10-2016 2130.26 2138.19 2114.72 2132.55 3580450000   2132.55 
3  12-10-2016 2137.67 2145.36 2132.77 2139.18 2977100000   2139.18 
4  11-10-2016 2161.35 2161.56 2128.84 2136.73 3438270000   2136.73 
5  10-10-2016 2160.39 2169.60 2160.39 2163.66 2916550000   2163.66 

Read.csv by default produces the file in DataFrame format; this can be checked by running the following code:

>print(is.data.frame(data)) 

Now, whatever analysis you want to do, you can perform it by applying various functions on the DataFrame in R, and once you have done the analysis, you can write your desired output file using the following code:

>write.csv(data,"result.csv") 
>output <- read.csv("result.csv") 
>print(output) 

When the preceding code gets executed, it writes the output file in the working directory folder in CSV format.

XLSX

Excel is the most common format of file for storing data, and it ends with extension .xls or .xlsx.

The xlsx package will be used to read or write .xlsx files in the R environment.

Installing the xlsx package has dependency on Java, so Java needs to be installed on the system. The xlsx package can be installed using the following command:

>install.packages("xlsx")

When the previous command gets executed, it will ask for the nearest CRAN mirror, which the user has to select to install the package. We can verify that the package has been installed or not by executing the following command:

>any(grepl("xlsx",installed.packages()))

If it has been installed successfully, it will show the following output:

[1] TRUE
Loading required package: rJava
Loading required package: methods
Loading required package: xlsxjars

We can load the xlsx library by running the following script:

>library("xlsx") 

Now let us save the previous sample file in .xlsx format and read it in the R environment, which can be done by executing the following code:

>data <- read.xlsx("Sample.xlsx", sheetIndex = 1) 
>print(data) 

This gives a DataFrame output with the following content:

       Date    Open    High     Low   Close     Volume    Adj.Close 
1 2016-10-14 2139.68 2149.19 2132.98 2132.98 3228150000   2132.98 
2 2016-10-13 2130.26 2138.19 2114.72 2132.55 3580450000   2132.55 
3 2016-10-12 2137.67 2145.36 2132.77 2139.18 2977100000   2139.18 
4 2016-10-11 2161.35 2161.56 2128.84 2136.73 3438270000   2136.73 
5 2016-10-10 2160.39 2169.60 2160.39 2163.66 2916550000   2163.66 

Similarly, you can write R files in .xlsx format by executing the following code:

>output<-write.xlsx(data,"result.xlsx") 
>output<- read.csv("result.csv") 
>print(output) 

Web data or online sources of data

The Web is one main source of data these days, and we want to directly bring the data from web form to the R environment. R supports this:

URL <- "http://ichart.finance.yahoo.com/table.csv?s=^GSPC" 
snp <- as.data.frame(read.csv(URL)) 
head(snp) 

When the preceding code is executed, it directly brings the data for the S&P500 index into R in DataFrame format. A portion of the data has been displayed by using the head() function here:

        Date    Open    High     Low   Close     Volume   Adj.Close 
1 2016-10-14 2139.68 2149.19 2132.98 2132.98 3228150000   2132.98 
2 2016-10-13 2130.26 2138.19 2114.72 2132.55 3580450000   2132.55 
3 2016-10-12 2137.67 2145.36 2132.77 2139.18 2977100000   2139.18 
4 2016-10-11 2161.35 2161.56 2128.84 2136.73 3438270000   2136.73 
5 2016-10-10 2160.39 2169.60 2160.39 2163.66 2916550000   2163.66 
6 2016-10-07 2164.19 2165.86 2144.85 2153.74 3619890000   2153.74 

Similarly, if we execute the following code, it brings the DJI index data into the R environment: its sample is displayed here:

>URL <- "http://ichart.finance.yahoo.com/table.csv?s=^DJI" 
>dji <- as.data.frame(read.csv(URL)) 
>head(dji) 

This gives the following output:

        Date     Open     High      Low    Close   Volume  Adj.Close 
1 2016-10-14 18177.35 18261.11 18138.38 18138.38 87050000  18138.38 
2 2016-10-13 18088.32 18137.70 17959.95 18098.94 83160000  18098.94 
3 2016-10-12 18132.63 18193.96 18082.09 18144.20 72230000  18144.20 
4 2016-10-11 18308.43 18312.33 18061.96 18128.66 88610000  18128.66 
5 2016-10-10 18282.95 18399.96 18282.95 18329.04 72110000  18329.04 
6 2016-10-07 18295.35 18319.73 18149.35 18240.49 82680000  18240.49 

Please note that we will be mostly using the snp and dji indexes for example illustrations in the rest of the book and these will be referred to as snp and dji.

Databases

A relational database stores data in normalized format, and to perform statistical analysis, we need to write complex and advance queries. But R can connect to various relational databases such as MySQL Oracle, and SQL Server, easily and convert the data tables into DataFrames. Once the data is in DataFrame format, doing statistical analysis is easy to perform using all the available functions and packages.

In this section, we will take the example of MySQL as reference.

R has a built-in package, RMySQL , which provides connectivity with the database; it can be installed using the following command:

>install.packages("RMySQL")

Once the package is installed, we can create a connection object to create a connection with the database. It takes username, password, database name, and localhost name as input. We can give our inputs and use the following command to connect with the required database:

>mysqlconnection = dbConnect(MySQL(), user = '...', password = '...', dbname = '..',host = '.....')

When the database is connected, we can list the table that is present in the database by executing the following command:

>dbListTables(mysqlconnection)

We can query the database using the function dbSendQuery(), and the result is returned to R by using function fetch(). Then the output is stored in DataFrame format:

>result = dbSendQuery(mysqlconnection, "select * from <table name>") 
>data.frame = fetch(result) 
>print(data.fame) 

When the previous code gets executed, it returns the required output.

We can query with a filter clause, update rows in database tables, insert data into a database table, create tables, drop tables, and so on by sending queries through dbSendQuery().