Book Image

Mastering R for Quantitative Finance

Book Image

Mastering R for Quantitative Finance

Overview of this book

Table of Contents (20 chapters)
Mastering R for Quantitative Finance
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

K-means clustering on big data


Data frames and matrices are easy-to-use objects in R, with typical manipulations that execute quickly on datasets with a reasonable size. However, problems can arise when the user needs to handle larger data sets. In this section, we will illustrate how the bigmemory and biganalytics packages can solve the problem of too large datasets, which is impossible to handle by data frames or data tables.

Note

The latest updates of bigmemory, biganalytics, and biglm packages are not available on Windows at time of writing this chapter. The examples shown here assume that R Version 2.15.3 is the current state-of-the-art version of R for Windows.

In the following example, we will perform K-means clustering on large datasets. For illustrative purposes, we will use the Airline Origin and Destination Survey data of the U.S. Bureau of Transportation Statistics. The datasets contain the summary characteristics of more than 3 million domestic flights, including the itinerary...