-
Book Overview & Buying
-
Table Of Contents
Python Data Cleaning Cookbook
By :
Many years ago, a very seasoned researcher said to me, "90% of what we're going to find, we'll see in the frequency distributions." That message has stayed with me. The more one-way and two-way frequency distributions (crosstabs) I do on a DataFrame, the better I understand it. We will do one-way distributions in this recipe, and crosstabs in subsequent recipes.
We continue our work with the NLS. We will also be doing a fair bit of column selection using filter methods. It is not necessary to review the recipe in this chapter on column selection, but it might be helpful.
We use pandas tools to generate frequencies, particularly the very handy value_counts:
pandas library and the nls97 file.Also, convert the columns with object data type to category data type:
>>> import pandas as pd
>>> nls97 = pd.read_csv("data/nls97.csv...