Book Image

Pandas Cookbook

By : Theodore Petrou
Book Image

Pandas Cookbook

By: Theodore Petrou

Overview of this book

This book will provide you with unique, idiomatic, and fun recipes for both fundamental and advanced data manipulation tasks with pandas 0.20. Some recipes focus on achieving a deeper understanding of basic principles, or comparing and contrasting two similar operations. Other recipes will dive deep into a particular dataset, uncovering new and unexpected insights along the way. The pandas library is massive, and it's common for frequent users to be unaware of many of its more impressive features. The official pandas documentation, while thorough, does not contain many useful examples of how to piece together multiple commands like one would do during an actual analysis. This book guides you, as if you were looking over the shoulder of an expert, through practical situations that you are highly likely to encounter. Many advanced recipes combine several different features across the pandas 0.20 library to generate results.
Table of Contents (12 chapters)

What this book covers

Chapter 1, Pandas Foundations, covers the anatomy and vocabulary used to identify the components of the two main pandas data structures, the Series and the DataFrame. Each column must have exactly one type of data, and each of these data types is covered. You will learn how to unleash the power of the Series and the DataFrame by calling and chaining together their methods.

Chapter 2, Essential DataFrame Operations, focuses on the most crucial and common operations that you will perform during data analysis.

Chapter 3, Beginning Data Analysis, helps you develop a routine to get started after reading in your data. Other interesting discoveries will be made.

Chapter 4, Selecting Subsets of Data, covers the many varied and potentially confusing ways of selecting different subsets of data.

Chapter 5, Boolean Indexing, covers the process of querying your data to select subsets of it based on Boolean conditions.

Chapter 6, Index Alignment, targets the very important and often misunderstood index object. Misuse of the Index is responsible for lots of erroneous results, and these recipes show you how to use it correctly to deliver powerful results.

Chapter 7, Grouping for Aggregation, Filtration, and Transformation, covers the powerful grouping capabilities that are almost always necessary during a data analysis. You will build customized functions to apply to your groups.

Chapter 8, Restructuring Data into Tidy Form, explains what tidy data is and why it’s so important, and then it shows you how to transform many different forms of messy datasets into tidy ones.

Chapter 9, Combining Pandas Objects, covers the many available methods to combine DataFrames and Series vertically or horizontally. We will also do some web-scraping to compare President Trump's and Obama's approval rating and connect to an SQL relational database.

Chapter 10, Time Series Analysis, covers advanced and powerful time series capabilities to dissect by any dimension of time possible.

Chapter 11, Visualization with Matplotlib, Pandas, and Seaborn, introduces the matplotlib library, which is responsible for all of the plotting in pandas. We will then shift focus to the pandas plot method and, finally, to the seaborn library, which is capable of producing aesthetically pleasing visualizations not directly available in pandas.