Using groupby to organize data by groups
At a certain point in most data analysis projects, we have to generate summary statistics by groups. While this can be done using the approaches in the previous recipe, in most cases the pandas DataFrame
groupby method is a better choice. If
groupby can handle an aggregation task—and it usually can—it is likely the most efficient way to accomplish that task. We make good use of
groupby in the remaining recipes in this chapter. We go over the basics in this recipe.
We will work with the COVID-19 daily data in this recipe.
How to do it…
We will create a pandas
groupby DataFrame and use it to generate summary statistics by group:
numpy, and load the Covid case daily data:
>>> import pandas as pd >>> import numpy as np >>> coviddaily = pd.read_csv("data/coviddaily720.csv", parse_dates=["casedate"])
- Create a pandas