Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying Haskell Data Analysis cookbook
  • Table Of Contents Toc
  • Feedback & Rating feedback
Haskell Data Analysis cookbook

Haskell Data Analysis cookbook

By : Nishant Shukla
3.7 (6)
close
close
Haskell Data Analysis cookbook

Haskell Data Analysis cookbook

3.7 (6)
By: Nishant Shukla

Overview of this book

Step-by-step recipes filled with practical code samples and engaging examples demonstrate Haskell in practice, and then the concepts behind the code. This book shows functional developers and analysts how to leverage their existing knowledge of Haskell specifically for high-quality data analysis. A good understanding of data sets and functional programming is assumed.
Table of Contents (14 chapters)
close
close
13
Index

Accumulating text data from a file path

One of the easiest ways to get started with processing input is by reading raw text from a local file. In this recipe, we will be extracting all the text from a specific file path. Furthermore, to do something interesting with the data, we will count the number of words per line.

Tip

Haskell is a purely functional programming language, right? Sure, but obtaining input from outside the code introduces impurity. For elegance and reusability, we must carefully separate pure from impure code.

Getting ready

We will first create an input.txt text file with a couple of lines of text to be read by the program. We keep this file in an easy-to-access directory because it will be referenced later. For example, the text file we're dealing with contains a seven-line quote by Plato. Here's what our terminal prints when we issue the following command:

$ cat input.txt

And how will you inquire, Socrates,
into that which you know not? 
What will you put forth as the subject of inquiry? 
And if you find what you want, 
how will you ever know that 
this is what you did not know?

Tip

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you. The code will also be hosted on GitHub at https://github.com/BinRoot/Haskell-Data-Analysis-Cookbook.

How to do it...

Create a new file to start coding. We call our file Main.hs.

  1. As with all executable Haskell programs, start by defining and implementing the main function, as follows:
    main :: IO ()
    main = do
    
  2. Use Haskell's readFile :: FilePath -> IO String function to extract data from an input.txt file path. Note that a file path is just a synonym for String. With the string in memory, pass it into a countWords function to count the number of words in each line, as shown in the following steps:
    input <- readFile "input.txt"
    print $ countWords input
    
  3. Lastly, define our pure function, countWords, as follows:
    countWords :: String -> [Int]
    countWords input = map (length.words) (lines input)
    
  4. The program will print out the number of words per line represented as a list of numbers as follows:
    $ runhaskell Main.hs
    
    [6,6,10,7,6,7]
    

How it works...

Haskell provides useful input and output (I/O) capabilities for reading input and writing output in different ways. In our case, we use readFile to specify a path of a file to be read. Using the do keyword in main suggests that we are joining several IO actions together. The output of readFile is an I/O string, which means it is an I/O action that returns a String type.

Now we're about to get a bit technical. Pay close attention. Alternatively, smile and nod. In Haskell, the I/O data type is an instance of something called a Monad. This allows us to use the <- notation to draw the string out of this I/O action. We then make use of the string by feeding it into our countWords function that counts the number of words in each line. Notice how we separated the countWords function apart from the impure main function.

Finally, we print the output of countWords. The $ notation means we are using a function application to avoid excessive parenthesis in our code. Without it, the last line of main would look like print (countWords input).

See also

For simplicity's sake, this code is easy to read but very fragile. If an input.txt file does not exist, then running the code will immediately crash the program. For example, the following command will generate the error message:

$ runhaskell Main.hs

Main.hs: input.txt: openFile: does not exist…

To make this code fault tolerant, refer to the Catching I/O code faults recipe.

Visually different images
CONTINUE READING
83
Tech Concepts
36
Programming languages
73
Tech Tools
Icon Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.
Icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Icon 50+ new titles added per month and exclusive early access to books as they are being written.
Haskell Data Analysis cookbook
notes
bookmark Notes and Bookmarks search Search in title playlist Add to playlist font-size Font size

Change the font size

margin-width Margin width

Change margin width

day-mode Day/Sepia/Night Modes

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY

Submit Your Feedback

Modal Close icon
Modal Close icon
Modal Close icon