Book Image

Learning pandas - Second Edition

By : Michael Heydt
Book Image

Learning pandas - Second Edition

By: Michael Heydt

Overview of this book

You will learn how to use pandas to perform data analysis in Python. You will start with an overview of data analysis and iteratively progress from modeling data, to accessing data from remote sources, performing numeric and statistical analysis, through indexing and performing aggregate analysis, and finally to visualizing statistical data and applying pandas to finance. With the knowledge you gain from this book, you will quickly learn pandas and how it can empower you in the exciting world of data manipulation, analysis and science.
Table of Contents (16 chapters)

Tidying Up Your Data

We are at that point in the data processing pipeline where we need to look at the data that we have retrieved and address any anomalies that may present themselves during analysis. These anomalies can exist for a multitude of reasons. Sometimes, certain parts of the data are not recorded or perhaps get lost. Maybe there are units that don't match your system's units. Many times, certain data points can be duplicated.

This process of dealing with anomalous data is often referred to as tidying your data, and you will see this term used many times in data analysis. This is a very important step in the pipeline, and it can consume much of your time before you even get to working on simple analyses.

Tidying of data can be a tedious problem, particularly when using programming tools that are not designed for the specific task of data cleanup. Fortunately...