Book Image

Learning Pandas

By : Michael Heydt
Book Image

Learning Pandas

By: Michael Heydt

Overview of this book

Table of Contents (19 chapters)
Learning pandas
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Chapter 7. Tidying Up Your Data

Data analysis typically flows in a processing pipeline that starts with retrieving data from one or more sources. Upon receipt of this data, it is often the case that it can be in a raw form and can be difficult to use for data analysis. This can be for a multitude of reasons such as data is not recorded, it is lost, or it is just in a different format than what you require.

Therefore, one of the most common things you will do with pandas involves tidying your data, which is the process of preparing raw data for analysis. Showing you how to use various features of pandas to get raw data into a tidy form is the focus of this chapter.

In this chapter, you will learn:

  • The concept of tidy data

  • How pandas represents unknown values

  • How to find NaN values in data

  • How to filter (drop) data

  • What pandas does with unknown values in calculations

  • How to find, filter and fix unknown values

  • How to identify and remove duplicate data

  • How to transform values using replace, map, and apply...