Book Image

Getting Started with Haskell Data Analysis [Video]

By : James Church
Book Image

Getting Started with Haskell Data Analysis [Video]

By: James Church

Overview of this book

<p>Data analysis is part computer science and part statistics. An important part of data analysis is validating your assumptions with real-world data to see if there is a pattern, or a particular user behavior that you can validate. This video course will help you get up to speed with the basics of data analysis and approaches in the Haskell language. You'll learn about statistical computing, file formats (CSV and SQLite3), descriptive statistics, charts, and onto more advanced concepts like understanding the importance of normal distribution. Whilst mathematics is a big part of data analysis, we’ve tried to keep this course simple and approachable so that you can apply what you learn to the real world.</p> <h1>Style and Approach:</h1> <p>The style of this course is driven by problem solving using real-world data. In some sections, we will begin by seeking out datasets that are readily accessible on the Internet, downloading them, and then performing some analysis. Each video builds a little on the video before it at a conversational pace. We use the Jupyter notebook system, which allows us to easily create and share notebooks of our analysis work. You can download the notebooks that we create alongside each of our videos.</p>
Table of Contents (6 chapters)
Chapter 3
Regular Expressions
Content Locked
Section 4
Regular Expressions in CSV files
We've covered most of the regular expression syntax, but we haven't used them in a real-world circumstance yet. We're going to open our baseball dataset and find the average number of away team scores in the month of March. - We begin by extracting the date column and the away team runs column. We use a regular expression on the date column to create a Boolean list of March/Not-March values - Next, we zip the Boolean list with the away team runs, and then filter everything that has a true value - Finally, we drop the true values and pass the filtered list to our mean function