Book Image

Haskell Data Analysis Cookbook

By : Nishant Shukla
Book Image

Haskell Data Analysis Cookbook

By: Nishant Shukla

Overview of this book

Table of Contents (19 chapters)
Haskell Data Analysis Cookbook
About the Author
About the Reviewers

Ignoring punctuation and specific characters

Usually in natural language processing, some uninformative words or characters, called stop words, can be filtered out for easier handling. When computing word frequencies or extracting sentiment data from a corpus, punctuation or special characters might need to be ignored. This recipe demonstrates how to remove these specific characters from the body of a text.

How to do it...

There are no imports necessary. Create a new file, which we will call Main.hs, and perform the following steps:

  1. Implement main and define a string called quote. The back slashes (\) represent multiline strings:

    main :: IO ()
    main = do
      let quote = "Deep Blue plays very good chess-so what?\ 
        \Does that tell you something about how we play chess?\
        \No. Does it tell you about how Kasparov envisions,\ 
        \understands a chessboard? (Douglas Hofstadter)"
      putStrLn $ (removePunctuation.replaceSpecialSymbols) quote
  2. Replace all punctuation marks with an empty string, and...