Book Image

Haskell Data Analysis Cookbook

By : Nishant Shukla
Book Image

Haskell Data Analysis Cookbook

By: Nishant Shukla

Overview of this book

Table of Contents (19 chapters)
Haskell Data Analysis Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Using a bloom filter to remove unique items


A bloom filter is an abstract data type that tests whether an item exists in a set. Unlike a typical hash map data structure, a bloom filter only takes up a constant amount of space. The advantage comes in handy when dealing with billions of data, such as representations of DNA strands as strings: "GATA", "CTGCTA", and so on.

In this recipe, we will use a bloom filter to try to remove unique DNA strands from a list. This is often desired because a typical DNA sample may contain thousands of strands that only appear once. The major disadvantage of a bloom filter is that false positive results for membership are possible. The bloom filter may accidentally claim that an element exists. Though false negatives are not possible: a bloom filter will never claim that an element does not exist when it actually does.

Getting ready

Import the bloom filter package from Cabal as follows:

$ cabal install bloomfilter

How to do it…

  1. Import the bloom filter package...