Selecting with unique and sorted indexes
Index selection performance drastically improves when the index is unique or sorted. The prior recipe used an unsorted index that contained duplicates, which makes for relatively slow selections.
In this recipe, we use the college dataset to form unique or sorted indexes to increase the performance of index selection. We will continue to compare the performance to Boolean indexing as well.
If you are only selecting from a single column and that is a bottleneck for you, this recipe can save you ten times the effort
How to do it…
- Read in the college dataset, create a separate DataFrame with
STABBR
as the index, and check whether the index is sorted:>>> college = pd.read_csv("data/college.csv") >>> college2 = college.set_index("STABBR") >>> college2.index.is_monotonic False
- Sort the index from
college2
and store it as another object: ...