Book Image

Haskell Data Analysis Cookbook

By : Nishant Shukla
Book Image

Haskell Data Analysis Cookbook

By: Nishant Shukla

Overview of this book

Table of Contents (19 chapters)
Haskell Data Analysis Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Preface

Data analysis is something that many of us have done before, maybe even without knowing it. It is the essential art of gathering and examining pieces of information to suit a variety of purposes—from visual inspection to machine learning techniques. Through data analysis, we can harness the meaning from information littered all around the digital realm. It enables us to resolve the most peculiar inquiries, perhaps even summoning new ones in the process.

Haskell acts as our conduit for robust data analysis. For some, Haskell is a programming language reserved to the most elite researchers in academia and industry. Yet, we see it charming one of the fastest growing cultures of open source developers around the world. The growth of Haskell is a sign that people are uncovering its magnificent functional pureness, resilient type safety, and remarkable expressiveness. Flip the pages of this book to see it all in action.

Haskell Data Analysis Cookbook is more than just a fusion of two entrancing topics in computing. It is also a learning tool for the Haskell programming language and an introduction to simple data analysis practices. Use it as a Swiss Army Knife of algorithms and code snippets. Try a recipe a day, like a kata for your mind. Breeze through the book for creative inspiration from catalytic examples. Also, most importantly, dive deep into the province of data analysis in Haskell.

Of course, none of this would have been possible without a thorough feedback from the technical editors, brilliant chapter illustrations by Lonku (http://lonku.tumblr.com), and helpful layout and editing support by Packt Publishing.

What this book covers

Chapter 1, The Hunt for Data, identifies core approaches in reading data from various external sources such as CSV, JSON, XML, HTML, MongoDB, and SQLite.

Chapter 2, Integrity and Inspection, explains the importance of cleaning data through recipes about trimming whitespaces, lexing, and regular expression matching.

Chapter 3, The Science of Words, introduces common string manipulation algorithms, including base conversions, substring matching, and computing the edit distance.

Chapter 4, Data Hashing, covers essential hashing functions such as MD5, SHA256, GeoHashing, and perceptual hashing.

Chapter 5, The Dance with Trees, establishes an understanding of the tree data structure through examples that include tree traversals, balancing trees, and Huffman coding.

Chapter 6, Graph Fundamentals, manifests rudimentary algorithms for graphical networks such as graph traversals, visualization, and maximal clique detection.

Chapter 7, Statistics and Analysis, begins the investigation of important data analysis techniques that encompass regression algorithms, Bayesian networks, and neural networks.

Chapter 8, Clustering and Classification, involves quintessential analysis methods that involve k-means clustering, hierarchical clustering, constructing decision trees, and implementing the k-Nearest Neighbors classifier.

Chapter 9, Parallel and Concurrent Design, introduces advanced topics in Haskell such as forking I/O actions, mapping over lists in parallel, and benchmarking performance.

Chapter 10, Real-time Data, incorporates streamed data interactions from Twitter, Internet Relay Chat (IRC), and sockets.

Chapter 11, Visualizing Data, deals with sundry approaches to plotting graphs, including line charts, bar graphs, scatter plots, and D3.js visualizations.

Chapter 12, Exporting and Presenting, concludes the book with an enumeration of algorithms for exporting data to CSV, JSON, HTML, MongoDB, and SQLite.

What you need for this book

Who this book is for

  • Those who have begun tinkering with Haskell but desire stimulating examples to kick-start a new project will find this book indispensable.

  • Data analysts new to Haskell should use this as a reference for functional approaches to data-modeling problems.

  • A dedicated beginner to both the Haskell language and data analysis is blessed with the maximal potential for learning new topics covered in this book.

Conventions

In this book, you will find a number of styles of text that distinguish between different kinds of information. Here are some examples of these styles, and an explanation of their meaning.

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "Apply the readString function to the input, and get all date documents."

A block of code is set as follows:

main :: IO () 
main = do 
  input <- readFile "input.txt"
  print input

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

main :: IO () 
main = do 
  input <- readFile "input.txt"
  print input

Any command-line input or output is written as follows:

$ runhaskell Main.hs

New terms and important words are shown in bold. Words that you see on the screen, in menus, or dialog boxes for example, appear in the text like this: "Under the Downloads section, download the cabal source package."

Note

Warnings or important notes appear in a box like this.

Tip

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of.

To send us general feedback, simply send an e-mail to , and mention the book title via the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide on www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you. Also, we highly suggest obtaining all source code from GitHub available at https://github.com/BinRoot/Haskell-Data-Analysis-Cookbook.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you would report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the errata submission form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded on our website, or added to any list of existing errata, under the Errata section of that title. Any existing errata can be viewed by selecting your title from http://www.packtpub.com/support. Code revisions can also be made on the accompanying GitHub repository located at https://github.com/BinRoot/Haskell-Data-Analysis-Cookbook.

Piracy

Piracy of copyright material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works, in any form, on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at with a link to the suspected pirated material.

We appreciate your help in protecting our authors, and our ability to bring you valuable content.

Questions

You can contact us at if you are having a problem with any aspect of the book, and we will do our best to address it.