Book Image

Haskell Data Analysis Cookbook

By : Nishant Shukla
Book Image

Haskell Data Analysis Cookbook

By: Nishant Shukla

Overview of this book

Table of Contents (19 chapters)
Haskell Data Analysis Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Computing the Manhattan distance


Defining a distance between two items allows us to easily interpret clusters and patterns. The Manhattan distance is one of the easiest to implement and is used primarily due to its simplicity.

The Manhattan distance (or Taxicab distance) between two items is the sum of the absolute differences of their coordinates. So if we are given two points (1, 1) and (5, 4), then the Manhattan distance will be |1-5| + |1-4| = 4 + 3 = 7.

We can use this distance metric to detect whether an item is unusually far away from everything else. In this recipe, we will detect outliers using the Manhattan distance. The calculations merely involve addition and subtraction, and therefore, it performs exceptionally well for a very large amount of data.

Getting ready

Create a list of comma-separated points. We will compute the smallest distance between these points and a test point:

$ cat input.csv

0,0
10,0
0,10
10,10
5,5

How to do it...

Create a new file, which we will call Main.hs...