One common task in data analysis is data lookup, which is often implemented via a list in R. For example, to look up customers' ages, we can define a list, say, cust_age
, with values set to customer ages and names set to the corresponding customer names (or IDs), that is names(cust_age) <- cust_name
. In this case, to look up John Doe's age, the following can be called: cust_age[["John_Doe"]]
. However, the implementation of lists in R is not optimized for lookup; it incurs O(N) time complexity to perform a lookup on a list of N elements. This means that the values indexed later in the list require more time to look up. As N grows, this effect gets stronger. When a program requires frequent lookups, the cumulative effect can be significant. An alternative to lists that offers a more optimized data lookup is a hash table. In R, this is available from the CRAN package hash. A hash table's lookup incurs O(1) time complexity.
The next code...