Book Image

Pandas Cookbook

By : Theodore Petrou
Book Image

Pandas Cookbook

By: Theodore Petrou

Overview of this book

This book will provide you with unique, idiomatic, and fun recipes for both fundamental and advanced data manipulation tasks with pandas 0.20. Some recipes focus on achieving a deeper understanding of basic principles, or comparing and contrasting two similar operations. Other recipes will dive deep into a particular dataset, uncovering new and unexpected insights along the way. The pandas library is massive, and it's common for frequent users to be unaware of many of its more impressive features. The official pandas documentation, while thorough, does not contain many useful examples of how to piece together multiple commands like one would do during an actual analysis. This book guides you, as if you were looking over the shoulder of an expert, through practical situations that you are highly likely to encounter. Many advanced recipes combine several different features across the pandas 0.20 library to generate results.
Table of Contents (12 chapters)

Conventions

In this book, you will find a few text styles that distinguish between different kinds of information. Most commonly you will see blocks of code during each recipe that will look like this:

>>> employee = pd.read_csv('data/employee')
>>> max_dept_salary = employee.groupby('DEPARTMENT')['BASE_SALARY'].max()

The pandas Series and DataFrames are stylized differently when output in the notebook. The pandas Series have no special formatting and are just raw text. They will appear directly preceding the line of code that creates them in the code block itself, like this:

>>> max_dept_salary.head()
DEPARTMENT Admn. & Regulatory Affairs 140416.0 City Controller's Office 64251.0 City Council 100000.0 Convention and Entertainment 38397.0 Dept of Neighborhoods (DON) 89221.0 Name: BASE_SALARY, dtype: float64

DataFrames, on the other hand, are nicely stylized in the notebooks and appear as images outside of the code box, like this:

>>> employee.pivot_table(index='DEPARTMENT', 
columns='GENDER',
values='BASE_SALARY').round(0).head()

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: In order to find the average BASE_SALARY by GENDER, you can use the pivot_table method.

New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: "In a Jupyter notebook, when holding down Shift + Tab + Tab with the cursor placed somewhere in the object, a window of the docs strings will pop out making the method far easier to use."

Tips and tricks appear like this.
Warnings or important notes appear in a box like this.