Learning Haskell Data Analysis

Learning Haskell Data Analysis

By : James Church

Buy this Book

Learning Haskell Data Analysis

By: James Church

Buy this Book

Overview of this book

<p>Haskell is trending in the field of data science by providing a powerful platform for robust data science practices. This book provides you with the skills to handle large amounts of data, even if that data is in a less than perfect state. Each chapter in the book helps to build a small library of code that will be used to solve a problem for that chapter. The book starts with creating databases out of existing datasets, cleaning that data, and interacting with databases within Haskell in order to produce charts for publications. It then moves towards more theoretical concepts that are fundamental to introductory data analysis, but in a context of a real-world problem with real-world data. As you progress in the book, you will be relying on code from previous chapters in order to help create new solutions quickly. By the end of the book, you will be able to manipulate, find, and analyze large and small sets of data using your own Haskell libraries.</p>

Learning Haskell Data Analysis

Credits

About the Author

About the Reviewers

www.PacktPub.com

Preface

Free Chapter

Tools of the Trade

Welcome to Haskell and data analysis!

Why Haskell?

Getting ready

Nearly essential tools of the trade

Our first Haskell program

Interactive Haskell

Summary

Getting Our Feet Wet

Type is king – the implications of strict types in Haskell

Working with csv files

Converting csv files to the SQLite3 format

Summary

Cleaning Our Datasets

Structured versus unstructured datasets

Creating your own structured data

Counting the number of fields in each record

Filtering data using regular expressions

Searching fields based on a regular expression

Summary

Plotting

Plotting data with EasyPlot

Simplifying access to data in SQLite3

Plotting data from a SQLite3 database

Plotting multiple datasets

Plotting a moving average

Summary

Hypothesis Testing

Data in a coin

Does a home-field advantage really exist?

Summary

Correlation and Regression Analysis

The terminology of correlation and regression

Study – is there a connection between scoring and winning?

Regression analysis

The pitfalls of regression analysis

Summary

Naive Bayes Classification of Twitter Data

An introduction to Naive Bayes classification

Creating a Twitter application

Summary

Building a Recommendation Engine

Analyzing the frequency of words in tweets

Working with multivariate data

Preparing our environment

Performing linear algebra in Haskell

Principal Component Analysis in Haskell

Building a recommendation engine

Summary

Regular Expressions in Haskell

A crash course in regular expressions

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Summary

Cleaning is not only the most important but also the least glamorous phase of data analysis. With Haskell and the power of regular expressions, we can quickly identify areas with large quantities of data that need our attention. We left our cleaning problem incomplete in this chapter. There is still plenty of data left to clean. The Gender and State columns need some serious work. They are left as an exercise for you to learn how to craft regular expressions to quickly identify the fields that require your attention.

We also discussed the unclear border between what is meant by the terms, structured data and unstructured data. I applied two pieces of criteria for structured data—the data is in a machine-readable format and the data adheres to a metadata document standard. Our example dataset is still a long way from being structured. We assume that the person who aggregated this data had a metadata document in mind, but that didn't stop us from performing a lot of cleaning.

Our next...

Learning Haskell Data Analysis

By : James Church

Learning Haskell Data Analysis

By: James Church

Overview of this book

Related Content you might be interested in

Current Title:

Learning Haskell Data Analysis

Summary