Book Image

Haskell Data Analysis Cookbook

By : Nishant Shukla
Book Image

Haskell Data Analysis Cookbook

By: Nishant Shukla

Overview of this book

Table of Contents (19 chapters)
Haskell Data Analysis Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Creating n-grams from a list


An n-gram is a sequence of n items that occur adjacently. For example, in the following sequence of number [1, 2, 5, 3, 2], a possible 3-gram is [5, 3, 2].

n-grams are useful in computing probability tables to predict the next item. In this recipe, we will be creating all possible n-grams from a list of items. A Markov chain can easily be trained by using n-gram computation from this recipe.

How to do it…

  1. Define the n-gram function as follows to produce all possible n-grams from a list:

    ngram :: Int -> [a] -> [[a]]
    ngram n xs 
      | n <= length xs = take n xs : ngram n (drop 1 xs)
      | otherwise = []
  2. Test it out on a sample list as follows:

    main = print $ ngram 3 "hello world"
  3. The printed 3-gram is as follows:

    ["hel","ell","llo","lo ","o w"," wo","wor","orl","rld"]