Categorical variable analysis helps us understand the categorical types of data. Categorical types are non-numeric. In this recipe, we're using days of the week. Technically, it's a category as opposed to purely numeric data. The creators of the dataset have already converted the category—the name of the day of the week—to a number. If they had not done this, we could use Pandas to do it for us, and then perform our analysis.
In this recipe, we are going to plot the distribution of casualties by the day of the week.
First, import the Python libraries that you need:
import pandas as pd import numpy as np import matplotlib as plt import matplotlib.pyplot as plt %matplotlib inline
Next, define a variable for the accidents data file, import the data, and view the top five rows:
accidents_data_file = '/Users/robertdempsey/Dropbox/private/Python Business Intelligence Cookbook/Data/Stats19-Data1979-2004/Accidents7904.csv' accidents = pd.read_csv...