Book Image

R High Performance Programming

Book Image

R High Performance Programming

Overview of this book

Table of Contents (17 chapters)
R High Performance Programming
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Use of hash tables for frequent lookups on large data


One common task in data analysis is data lookup, which is often implemented via a list in R. For example, to look up customers' ages, we can define a list, say, cust_age, with values set to customer ages and names set to the corresponding customer names (or IDs), that is names(cust_age) <- cust_name. In this case, to look up John Doe's age, the following can be called: cust_age[["John_Doe"]]. However, the implementation of lists in R is not optimized for lookup; it incurs O(N) time complexity to perform a lookup on a list of N elements. This means that the values indexed later in the list require more time to look up. As N grows, this effect gets stronger. When a program requires frequent lookups, the cumulative effect can be significant. An alternative to lists that offers a more optimized data lookup is a hash table. In R, this is available from the CRAN package hash. A hash table's lookup incurs O(1) time complexity.

The next code...