In very broad terms, data may be classified as either continuous or categorical. Continuous data is always numeric and represents some kind of measurement, such as height, wage, or salary. Continuous data can take on an infinite number of possibilities. Categorical data, on the other hand, represents discrete, finite amounts of values such as car color, type of poker hand, or brand of cereal.
Pandas does not broadly classify data as either continuous or categorical. Instead, it has precise technical definitions for many distinct data types. The following table contains all pandas data types, with their string equivalents, and some notes on each type:
Common data type name | NumPy/pandas object | Pandas string name | Notes |
Boolean |
np.bool
|
bool |
Stored as a single byte. |
Integer |
np.int
|
int |
Defaulted to 64 bits. Unsigned ints are also available - np.uint. |
Float |
np.float |
float |
Defaulted to 64 bits. |
Complex |
np.complex |
complex |
Rarely seen in data analysis. |
Object |
np.object |
O, object |
Typically strings but is a catch-all for columns with multiple different types or other Python objects (tuples, lists, dicts, and so on). |
Datetime |
np.datetime64, pd.Timestamp |
datetime64 |
Specific moment in time with nanosecond precision. |
Timedelta |
np.timedelta64, pd.Timedelta |
timedelta64 |
An amount of time, from days to nanoseconds. |
Categorical |
pd.Categorical |
category |
Specific only to pandas. Useful for object columns with relatively few unique values. |