Book Image

Pandas 1.x Cookbook - Second Edition

By : Matt Harrison, Theodore Petrou
Book Image

Pandas 1.x Cookbook - Second Edition

By: Matt Harrison, Theodore Petrou

Overview of this book

The pandas library is massive, and it's common for frequent users to be unaware of many of its more impressive features. The official pandas documentation, while thorough, does not contain many useful examples of how to piece together multiple commands as one would do during an actual analysis. This book guides you, as if you were looking over the shoulder of an expert, through situations that you are highly likely to encounter. This new updated and revised edition provides you with unique, idiomatic, and fun recipes for both fundamental and advanced data manipulation tasks with pandas. Some recipes focus on achieving a deeper understanding of basic principles, or comparing and contrasting two similar operations. Other recipes will dive deep into a particular dataset, uncovering new and unexpected insights along the way. Many advanced recipes combine several different features across the pandas library to generate results.
Table of Contents (17 chapters)
15
Other Books You May Enjoy
16
Index

Introduction

One of the most fundamental tasks during data analysis involves splitting data into independent groups before performing a calculation on each group. This methodology has been around for quite some time but has more recently been referred to as split-apply-combine. This chapter covers the powerful .groupby method, which allows you to group your data in any way imaginable and apply any type of function independently to each group before returning a single dataset.

Before we get started with the recipes, we will need to know just a little terminology. All basic groupby operations have grouping columns, and each unique combination of values in these columns represents an independent grouping of the data. The syntax looks as follows:

df.groupby(['list', 'of', 'grouping', 'columns'])
df.groupby('single_column')  # when grouping by a single column 

The result of calling the .groupby method is a groupby object...