Book Image

Learning pandas - Second Edition

By : Michael Heydt
Book Image

Learning pandas - Second Edition

By: Michael Heydt

Overview of this book

You will learn how to use pandas to perform data analysis in Python. You will start with an overview of data analysis and iteratively progress from modeling data, to accessing data from remote sources, performing numeric and statistical analysis, through indexing and performing aggregate analysis, and finally to visualizing statistical data and applying pandas to finance. With the knowledge you gain from this book, you will quickly learn pandas and how it can empower you in the exciting world of data manipulation, analysis and science.
Table of Contents (16 chapters)

The split, apply, and combine (SAC) pattern

Many data analysis problems utilize a pattern of processing data referred to as split-apply-combine. In this pattern, three steps are taken to analyze data:

  • A dataset is split into smaller pieces based on certain criteria
  • Each of these pieces are operated upon independently
  • All the results are then combined back and presented as a single unit

The following diagram demonstrates a simple split-apply-combine process to calculate the mean of values grouped by a character-based key (a or b):

The data is then split by the index label into two groups (one each for a and b). The mean of the values in each group is calculated. The resulting values from the group are then combined into a single pandas object, which is indexed by the label representing each group.

Splitting in pandas is performed using the .groupby() method of a Series or DataFrame...