Book Image

Learning Haskell Data Analysis

By : James Church
Book Image

Learning Haskell Data Analysis

By: James Church

Overview of this book

<p>Haskell is trending in the field of data science by providing a powerful platform for robust data science practices. This book provides you with the skills to handle large amounts of data, even if that data is in a less than perfect state. Each chapter in the book helps to build a small library of code that will be used to solve a problem for that chapter. The book starts with creating databases out of existing datasets, cleaning that data, and interacting with databases within Haskell in order to produce charts for publications. It then moves towards more theoretical concepts that are fundamental to introductory data analysis, but in a context of a real-world problem with real-world data. As you progress in the book, you will be relying on code from previous chapters in order to help create new solutions quickly. By the end of the book, you will be able to manipulate, find, and analyze large and small sets of data using your own Haskell libraries.</p>
Table of Contents (16 chapters)
Learning Haskell Data Analysis
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Summary


Cleaning is not only the most important but also the least glamorous phase of data analysis. With Haskell and the power of regular expressions, we can quickly identify areas with large quantities of data that need our attention. We left our cleaning problem incomplete in this chapter. There is still plenty of data left to clean. The Gender and State columns need some serious work. They are left as an exercise for you to learn how to craft regular expressions to quickly identify the fields that require your attention.

We also discussed the unclear border between what is meant by the terms, structured data and unstructured data. I applied two pieces of criteria for structured data—the data is in a machine-readable format and the data adheres to a metadata document standard. Our example dataset is still a long way from being structured. We assume that the person who aggregated this data had a metadata document in mind, but that didn't stop us from performing a lot of cleaning.

Our next...