Book Image

Learning Haskell Data Analysis

By : James Church
Book Image

Learning Haskell Data Analysis

By: James Church

Overview of this book

<p>Haskell is trending in the field of data science by providing a powerful platform for robust data science practices. This book provides you with the skills to handle large amounts of data, even if that data is in a less than perfect state. Each chapter in the book helps to build a small library of code that will be used to solve a problem for that chapter. The book starts with creating databases out of existing datasets, cleaning that data, and interacting with databases within Haskell in order to produce charts for publications. It then moves towards more theoretical concepts that are fundamental to introductory data analysis, but in a context of a real-world problem with real-world data. As you progress in the book, you will be relying on code from previous chapters in order to help create new solutions quickly. By the end of the book, you will be able to manipulate, find, and analyze large and small sets of data using your own Haskell libraries.</p>
Table of Contents (16 chapters)
Learning Haskell Data Analysis
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

An introduction to Naive Bayes classification


The Bayes theorem is a simple yet efficient method of classifying data. In the context of our example, tweets will be analyzed based on their individual words. There are three factors that go into a Naive Bayes classifier: prior knowledge, likelihood, and evidence. Together, they attempt to create a proportional measurement of an unknown quality of an event based on something knowable.

Prior knowledge

Prior knowledge allows us to contemplate our problem of discovering the language represented by a sentence without thinking about the features of the sentence. Think about answering the question blindly; that is, a sentence is spoken and you aren't allowed to see or hear it. What language was used? Of all of the tens of thousands of languages used across time, how could you ever guess this one? You are forced to play the odds. The top five most widely spoken languages are Mandarin, Spanish, English, Hindi, and Arabic. By selecting one of these languages...