Book Image

Getting Started with Haskell Data Analysis

By : James Church
Book Image

Getting Started with Haskell Data Analysis

By: James Church

Overview of this book

Every business and organization that collects data is capable of tapping into its own data to gain insights how to improve. Haskell is a purely functional and lazy programming language, well-suited to handling large data analysis problems. This book will take you through the more difficult problems of data analysis in a hands-on manner. This book will help you get up-to-speed with the basics of data analysis and approaches in the Haskell language. You'll learn about statistical computing, file formats (CSV and SQLite3), descriptive statistics, charts, and progress to more advanced concepts such as understanding the importance of normal distribution. While mathematics is a big part of data analysis, we've tried to keep this course simple and approachable so that you can apply what you learn to the real world. By the end of this book, you will have a thorough understanding of data analysis, and the different ways of analyzing data. You will have a mastery of all the tools and techniques in Haskell for effective data analysis.
Table of Contents (8 chapters)

Descriptive Statistics

In this book, we are going to learn about data analysis from the perspective of the Haskell
programming language. The goal of this book is to take you from being a beginner in math
and statistics, to the point that you feel comfortable working with large-scale datasets.
Now, the prerequisites for this book are that you know a little bit of the Haskell
programming language, and also a little bit of math and statistics. From there, we can start
you on your journey of becoming a data analyst.

In this chapter, we are going to cover descriptive statistics. Descriptive statistics are used to summarize a collection of values into one or two values. We begin with learning about the Haskell Text.CSV library. In later sections, we will cover in increasing difficulty the range, mean, median, and mode; you've probably heard of some of these descriptive statistics before, as they're quite common. We will be using the IHaskell environment on the Jupyter Notebook system.

The topics that we are going to cover are as follows:

  • The CSV library—working with CSV files
  • Data ranges
  • Data mean and standard deviation
  • Data median
  • Data mode