A distribution analysis helps us understand the distribution of the various attributes of our data. Once plotted, you can see how your data is broken up. In this recipe, we'll create three plots: a distribution of weather conditions, a boxplot of light conditions, and a boxplot of light conditions grouped by weather conditions.
First, import the Python libraries that you need:
import pandas as pd import numpy as np import matplotlib as plt import matplotlib.pyplot as plt %matplotlib inline
Next, define a variable for the accidents data file, import the data, and view the top five rows:
accidents_data_file = '/Users/robertdempsey/Dropbox/private/Python Business Intelligence Cookbook/Data/Stats19-Data1979-2004/Accidents7904.csv' accidents = pd.read_csv(accidents_data_file, sep=',', header=0, index_col=False, parse_dates=True, tupleize_cols...