Book Image

Getting Started with Haskell Data Analysis

By : James Church
Book Image

Getting Started with Haskell Data Analysis

By: James Church

Overview of this book

Every business and organization that collects data is capable of tapping into its own data to gain insights how to improve. Haskell is a purely functional and lazy programming language, well-suited to handling large data analysis problems. This book will take you through the more difficult problems of data analysis in a hands-on manner. This book will help you get up-to-speed with the basics of data analysis and approaches in the Haskell language. You'll learn about statistical computing, file formats (CSV and SQLite3), descriptive statistics, charts, and progress to more advanced concepts such as understanding the importance of normal distribution. While mathematics is a big part of data analysis, we've tried to keep this course simple and approachable so that you can apply what you learn to the real world. By the end of this book, you will have a thorough understanding of data analysis, and the different ways of analyzing data. You will have a mastery of all the tools and techniques in Haskell for effective data analysis.
Table of Contents (8 chapters)

Converting CSV variation files into SQLite3

In this section, we are going to be discussing CSV variations to SQLite3. CSV file stands for comma-separated values. Perhaps I mentioned in Chapter 1, Descriptive Statistics, about how CSV isn't really a standard, and it may come as a surprise to you that the comma-separated values do not have to be separated by commas in order to still be considered a CSV file. That's the lack of standardness there. So, in this section, we're going to be downloading the MovieLens dataset. We're going to be exploring the types of CSV file formats in this particular dataset and converting these datasets using SQLite3.

Let's go back to our machine and, in the Google search, type in MovieLens dataset. The first link in the search result will have our dataset and we want to download the MovieLens 100K dataset, as shown in the...