Book Image

Pandas Cookbook

By : Theodore Petrou
Book Image

Pandas Cookbook

By: Theodore Petrou

Overview of this book

This book will provide you with unique, idiomatic, and fun recipes for both fundamental and advanced data manipulation tasks with pandas 0.20. Some recipes focus on achieving a deeper understanding of basic principles, or comparing and contrasting two similar operations. Other recipes will dive deep into a particular dataset, uncovering new and unexpected insights along the way. The pandas library is massive, and it's common for frequent users to be unaware of many of its more impressive features. The official pandas documentation, while thorough, does not contain many useful examples of how to piece together multiple commands like one would do during an actual analysis. This book guides you, as if you were looking over the shoulder of an expert, through practical situations that you are highly likely to encounter. Many advanced recipes combine several different features across the pandas 0.20 library to generate results.
Table of Contents (12 chapters)

Renaming row and column names

One of the most basic and common operations on a DataFrame is to rename the row or column names. Good column names are descriptive, brief, and follow a common convention with respect to capitalization, spaces, underscores, and other features.

Getting ready

In this recipe, both the row and column names are renamed.

How to do it...

  1. Read in the movie dataset, and make the index meaningful by setting it as the movie title:
>>> movie = pd.read_csv('data/movie.csv', index_col='movie_title')
  1. The rename DataFrame method accepts dictionaries that map the old value to the new value. Let's create one for the rows and another for the columns:
>>> idx_rename = {'Avatar':'Ratava', 'Spectre': 'Ertceps'} 
>>> col_rename = {'director_name':'Director Name',
'num_critic_for_reviews': 'Critical Reviews'}
  1. Pass the dictionaries to the rename method, and assign the result to a new variable:
>>> movie_renamed = movie.rename(index=idx_rename, 
columns=col_rename)
>>> movie_renamed.head()

How it works...

The rename DataFrame method allows for both row and column labels to be renamed at the same time with the index and columns parameters. Each of these parameters may be set to a dictionary that maps old labels to their new values.

There's more...

There are multiple ways to rename row and column labels. It is possible to reassign the index and column attributes directly to a Python list. This assignment works when the list has the same number of elements as the row and column labels. The following code uses the tolist method on each Index object to create a Python list of labels. It then modifies a couple values in the list and reassigns the list to the attributes index and columns:

>>> movie = pd.read_csv('data/movie.csv', index_col='movie_title')
>>> index = movie.index
>>> columns = movie.columns

>>> index_list = index.tolist()
>>> column_list = columns.tolist()

# rename the row and column labels with list assignments
>>> index_list[0] = 'Ratava'
>>> index_list[2] = 'Ertceps'
>>> column_list[1] = 'Director Name'
>>> column_list[2] = 'Critical Reviews'

>>> print(index_list)
['Ratava', "Pirates of the Caribbean: At World's End", 'Ertceps', 'The Dark Knight Rises', ... ]

>>> print(column_list)
['color', 'Director Name', 'Critical Reviews', 'duration', ...]

# finally reassign the index and columns
>>> movie.index = index_list
>>> movie.columns = column_list