Book Image

Functional Python Programming

By : Steven F. Lott, Steven F. Lott
Book Image

Functional Python Programming

By: Steven F. Lott, Steven F. Lott

Overview of this book

Table of Contents (23 chapters)
Functional Python Programming
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Cleaning raw data with generator functions


One of the tasks that arise in exploratory data analysis is cleaning up raw source data. This is often done as a composite operation applying several scalar functions to each piece of input data to create a usable data set.

Let's look at a simplified set of data. This data is commonly used to show techniques in exploratory data analysis. It's called Anscombe's Quartet, and it comes from the article, Graphs in Statistical Analysis, by F. J. Anscombe that appeared in American Statistician in 1973. Following are the first few rows of a downloaded file with this dataset:

Anscombe's quartet
I  II  III  IV
x  y  x  y  x  y  x  y
10.0  8.04  10.0  9.14	  10.0  7.46  8.0  6.58
8.0	6.95  8.0  8.14  8.0  6.77  8.0  5.76
13.0  7.58  13.0  8.74  13.0  12.74  8.0  7.71

Sadly, we can't trivially process this with the csv module. We have to do a little bit of parsing to extract the useful information from this file. Since the data is properly tab-delimited, we can...