7. Anomaly Detection | F# for Machine Learning Essentials

Sign In Start Free Trial

Book Overview & Buying
Table Of Contents

F# for Machine Learning Essentials

By : Sudipta Mukherjee

2 (1)

F# for Machine Learning Essentials

2 (1)

By: Sudipta Mukherjee

Overview of this book

The F# functional programming language enables developers to write simple code to solve complex problems. With F#, developers create consistent and predictable programs that are easier to test and reuse, simpler to parallelize, and are less prone to bugs. If you want to learn how to use F# to build machine learning systems, then this is the book you want. Starting with an introduction to the several categories on machine learning, you will quickly learn to implement time-tested, supervised learning algorithms. You will gradually move on to solving problems on predicting housing pricing using Regression Analysis. You will then learn to use Accord.NET to implement SVM techniques and clustering. You will also learn to build a recommender system for your e-commerce site from scratch. Finally, you will dive into advanced topics such as implementing neural network algorithms while performing sentiment analysis on your data.

Preface

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Free Chapter

1. Introduction to Machine Learning

1. Introduction to Machine Learning

Objective

Why use F#?

Unsupervised learning

Machine learning frameworks

Machine learning for fun and profit

Recognizing handwritten digits – your "Hello World" ML program

Summary

2. Linear Regression

2. Linear Regression

Objective

Different types of linear regression algorithms

APIs used

The basics of matrices and vectors (a short and sweet refresher)

QR decomposition of a matrix

Linear regression method of least square

Finding linear regression coefficients using F#

Finding the linear regression coefficients using Math.NET

Putting it together with Math.NET and FsPlot

Multiple linear regression

Multiple linear regression and variations using Math.NET

Weighted linear regression

Plotting the result of multiple linear regression

Ridge regression

Multivariate multiple linear regression

Feature scaling

Summary

3. Classification Techniques

3. Classification Techniques

Objective

Different classification algorithms you will learn

Some interesting things you can do

Understanding logistic regression

Multiclass classification using logistic regression

Multiclass classification using decision trees

Predicting a traffic jam using a decision tree: a case study

Challenge yourself!

Summary

4. Information Retrieval

4. Information Retrieval

Objective

Different IR algorithms you will learn

What interesting things can you do?

Information retrieval using tf-idf

5. Collaborative Filtering

5. Collaborative Filtering

Objective

Different classification algorithms you will learn

Vocabulary of collaborative filtering

Baseline predictors

Item-item collaborative filtering

Top-N recommendations

Evaluating recommendations

Ranking accuracy metrics

Working with real movie review data (Movie Lens)

Summary

6. Sentiment Analysis

6. Sentiment Analysis

Objective

What you will learn

A baseline algorithm for SA using SentiWordNet lexicons

Handling negations

Identifying praise or criticism with sentiment orientation

Pointwise Mutual Information

Using SO-PMI to find sentiment analysis

Summary

7. Anomaly Detection

7. Anomaly Detection

Objective

Detecting point anomalies using IQR (Interquartile Range)

Detecting point anomalies using Grubb's test

Grubb's test for multivariate data using Mahalanobis distance

Chi-squared statistic to determine anomalies

Detecting anomalies using density estimation

Strategy to convert a collective anomaly to a point anomaly problem

Dealing with categorical data in collective anomalies

Summary

Index

Index

Detecting anomalies using density estimation

In general, normal elements are more common than anomalous entries in any system. So, if the probability of the occurrence of elements in a collection is modeled by the Gaussian or normal distribution, then we can conclude that the elements for which the estimated probability density is more than a predefined threshold are normal, and those for which the value is less than a predefined threshold are probably anomalies.

Let's say that is a random variable of rows. The following couple of formulae find the average and standard deviations for feature , or, in other words, for all the elements of in the jth column if is represented as a matrix.

Given a new entry x, the following formula calculates the probability density estimation:

If is less than a predefined threshold, then the entry is tagged to be anomalous, else it is tagged as normal.

The following code finds the average value of the jth feature:

Here is a sample run of the px method:

>...

CONTINUE READING

83

Tech Concepts

36

Programming languages

73

Tech Tools

Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

50+ new titles added per month and exclusive early access to books as they are being written.

F# for Machine Learning Essentials

Search

Your notes and bookmarks