Book Image

F# for Machine Learning Essentials

By : Sudipta Mukherjee
Book Image

F# for Machine Learning Essentials

By: Sudipta Mukherjee

Overview of this book

The F# functional programming language enables developers to write simple code to solve complex problems. With F#, developers create consistent and predictable programs that are easier to test and reuse, simpler to parallelize, and are less prone to bugs. If you want to learn how to use F# to build machine learning systems, then this is the book you want. Starting with an introduction to the several categories on machine learning, you will quickly learn to implement time-tested, supervised learning algorithms. You will gradually move on to solving problems on predicting housing pricing using Regression Analysis. You will then learn to use Accord.NET to implement SVM techniques and clustering. You will also learn to build a recommender system for your e-commerce site from scratch. Finally, you will dive into advanced topics such as implementing neural network algorithms while performing sentiment analysis on your data.
Table of Contents (16 chapters)
F# for Machine Learning Essentials
About the Author
About the Reviewers

Grubb's test for multivariate data using Mahalanobis distance

Grubb's test can be used for multivariate data by transforming multivariate data to univariate data using the following transformation:

Where is the covariance matrix of .

The following code finds these y-squared values from a given :

The following are the functions to calculate the covariance matrix:

The following is the input given:

This produces the following output:

ys = [([2.0; 2.0], -48066176.91); ([2.0; 5.0], -48066176.91);
 ([6.0; 5.0], -2584692.113); ([100.0; 345.0], -2.097348892e+12)]

Now, Grubb's test for univariate data can be applied on top of these generated values:

[-48066176.91; -48066176.91; -2584692.113; -2.097348892e+12]

The z scores of these values are:

[0.5773335755; 0.5773335755; 0.5773836562; 1.732050807]

As you can see, the z-score corresponding to the last entry is considerably bigger than the z-score of the rest. This means the last element in the multivariate dataset (which is [100;345]) is anomalous.