Book Image

Learning pandas - Second Edition

By : Michael Heydt
Book Image

Learning pandas - Second Edition

By: Michael Heydt

Overview of this book

You will learn how to use pandas to perform data analysis in Python. You will start with an overview of data analysis and iteratively progress from modeling data, to accessing data from remote sources, performing numeric and statistical analysis, through indexing and performing aggregate analysis, and finally to visualizing statistical data and applying pandas to finance. With the knowledge you gain from this book, you will quickly learn pandas and how it can empower you in the exciting world of data manipulation, analysis and science.
Table of Contents (16 chapters)

What is tidying your data?

Tidy data is a term that was coined in a paper named "Tidy Data" by Hadley Wickham. I highly recommend that you read this paper. It can be downloaded from http://vita.had.co.nz/papers/tidy-data.pdf.

The paper covers many details of the process of creating tidy data, the end result of which is that you have data that is free of surprises and is ready for analysis.

We will examine many of the tools in pandas for tidying your data. These exist because we need to handle the following situations:

  • The names of the variables are different from what you require
  • There is missing data
  • Values are not in the units that you require
  • The period of sampling of records is not what you need
  • Variables are categorical and you need quantitative values
  • There is noise in the data
  • Information is of an incorrect type
  • Data is organized around incorrect axes
  • Data is...