Book Image

Clojure Data Analysis Cookbook - Second Edition

By : Eric Richard Rochester
Book Image

Clojure Data Analysis Cookbook - Second Edition

By: Eric Richard Rochester

Overview of this book

Table of Contents (19 chapters)
Clojure Data Analysis Cookbook Second Edition
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Generating online summary statistics for data streams with reducers


We can use reducers in a lot of different situations, but sometimes we need to change how we process data to do so.

For this example, we'll show you how to compute summary statistics with reducers. We'll use some algorithms and formulas, first proposed by Tony F. Chan, Gene H. Golub, and Randall J. LeVeque in 1979, and later extended by Timothy B. Terriberry in 2007. These allow you to approximate the mean, standard deviation, and skew for online data (that is, to stream data that we might only see once). So, we will need to compute all of the statistics on one pass without holding the full collection in memory.

The following formulas are a little complicated and difficult to read in lisp notation. However, there's a good overview of this process, with formulas, on the Wikipedia page for Algorithms to calculate variance (http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance). In order to somewhat simplify this example...