Knowing how to handle categorical variables and mixed data types
Categorical variables usually have simpler structures or descriptive statistics than continuous variables. Here, we introduce the two main descriptive statistics and talk about some interesting cases when converting continuous variables to categorical ones.
Frequencies and proportions
When we discussed the mode for categorical variables, we introduced Counter
, which outputs a dictionary structure whose key-value pair is the element-counting pair. The following is an example of a counter:
Counter({2.0: 394, 3.0: 369, 6.0: 597, 1.0: 472, 9.0: 425, 7.0: 434, 8.0: 220, 4.0: 217, 5.0: 92})
The following code snippet illustrates frequency as a bar plot where the absolute values of counting become intuitive:
counter = Counter(df["Rural-urban_Continuum Code_2013"].dropna()) labels = [] x = [] for key, val in counter.items(): labels.append(str(key)) x.append...