Book Image

Haskell Data Analysis cookbook

By : Nishant Shukla
Book Image

Haskell Data Analysis cookbook

By: Nishant Shukla

Overview of this book

Step-by-step recipes filled with practical code samples and engaging examples demonstrate Haskell in practice, and then the concepts behind the code. This book shows functional developers and analysts how to leverage their existing knowledge of Haskell specifically for high-quality data analysis. A good understanding of data sets and functional programming is assumed.
Table of Contents (14 chapters)
13
Index

Deduplication of conflicting data items


Unfortunately, information about an item may be inconsistent throughout the corpus. Collision strategies are often domain-dependent, but one common way to manage this conflict is by simply storing all variations of the data. In this recipe, we will read a CSV file that contains information about musical artists and store all of the information about their songs and genres in a set.

Getting ready

Create a CSV input file with the following musical artists. The first column is for the name of the artist or band. The second column is the song name, and the third is the genre. Notice how some musicians have multiple songs or genres.

How to do it...

Create a new file, which we will call Main.hs, and perform the following steps:

  1. We will be using the CSV, Map, and Set packages:

    import Text.CSV (parseCSV, Record)
    import Data.Map (fromListWith)
    import qualified Data.Set as S
  2. Define the Artist data type corresponding to the CSV input. For fields that may contain conflicting...