Reducing memory by changing data types
pandas has precise technical definitions for many data types. However, when you load data from type-less formats such as CSV, pandas has to infer the type.
This recipe changes the data type of one of the object columns from the college dataset to the special pandas categorical data type to drastically reduce its memory usage.
How to do it…
- After reading in our college dataset, we select a few columns of different data types that will clearly show how much memory may be saved:
>>> college = pd.read_csv("data/college.csv") >>> different_cols = [ ... "RELAFFIL", ... "SATMTMID", ... "CURROPER", ... "INSTNM", ... "STABBR", ... ] >>> col2 = college.loc[:, different_cols] >>> col2.head() RELAFFIL SATMTMID ... INSTNM STABBR 0 0 420.0 ... Alabama ... AL 1 0 565.0...