Book Image

Clojure Data Analysis Cookbook - Second Edition

By : Eric Richard Rochester
Book Image

Clojure Data Analysis Cookbook - Second Edition

By: Eric Richard Rochester

Overview of this book

Table of Contents (19 chapters)
Clojure Data Analysis Cookbook Second Edition
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Finding data errors with Benford's law


Benford's law is a curious observation about the distribution of the first digits of numbers in many naturally occurring datasets. In sequences that conform to Benford's law, the first digit will be 1 about a third of the time, and higher digits will occur progressively less often. However, manually constructed data rarely looks like this. Because of that, lack of a Benford's Law distribution is evidence that a dataset is not manually constructed.

For example, this has been shown to hold true in financial data, and investigators leverage this for fraud detection. The US Internal Revenue Service reportedly uses it for identifying potential tax fraud, and financial auditors also use it.

Getting ready

We'll need these dependencies:

(defproject statim "0.1.0"
  :dependencies [[org.clojure/clojure "1.6.0"]
                 [incanter "1.5.5"]])

We'll also use these requirements:

(require
  '[incanter.core :as i]
  'incanter.io
  '[incanter.stats :as s])

For data...