Book Image

F# for Machine Learning Essentials

By : Sudipta Mukherjee
Book Image

F# for Machine Learning Essentials

By: Sudipta Mukherjee

Overview of this book

The F# functional programming language enables developers to write simple code to solve complex problems. With F#, developers create consistent and predictable programs that are easier to test and reuse, simpler to parallelize, and are less prone to bugs. If you want to learn how to use F# to build machine learning systems, then this is the book you want. Starting with an introduction to the several categories on machine learning, you will quickly learn to implement time-tested, supervised learning algorithms. You will gradually move on to solving problems on predicting housing pricing using Regression Analysis. You will then learn to use Accord.NET to implement SVM techniques and clustering. You will also learn to build a recommender system for your e-commerce site from scratch. Finally, you will dive into advanced topics such as implementing neural network algorithms while performing sentiment analysis on your data.
Table of Contents (16 chapters)
F# for Machine Learning Essentials
Credits
Foreword
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Index

Index

A

  • Accord.NET
    • URL / Machine learning frameworks, Math.NET Numerics for F# 3.7.0
    • about / Objective, Math.NET Numerics for F# 3.7.0
  • accuracy metrics
    • ranking / Ranking accuracy metrics
  • accuracy parameters, for eecommendations evaluation
    • about / Evaluating recommendations
  • accuracy parameters, for recommendations evaluation
    • prediction accuracy / Prediction accuracy
    • confusion matrix / Confusion matrix (decision support)
  • anomalies
    • determining, with Chi-squared statistic / Chi-squared statistic to determine anomalies
    • detecting, density estimation used / Detecting anomalies using density estimation
  • Anomaly detection / Different areas where machine learning is being used
  • anomaly detection
    • actions / Some cool things you will do
    • types / The different types of anomalies
  • APIs
    • Math.NET Numerics / Math.NET Numerics for F# 3.7.0
  • asymmetric binary attributes similarity
    • about / Similarity of asymmetric binary attributes
    • Sokal-Sneath 1 index / Similarity of asymmetric binary attributes
    • Sokal - Sneath 2 index / Similarity of asymmetric binary attributes
    • Sokal - Sneath 3 index / Similarity of asymmetric binary attributes
    • Sokal - Sneath 4 index / Similarity of asymmetric binary attributes
    • Jaccard coefficient / Similarity of asymmetric binary attributes
    • simple matching / Similarity of asymmetric binary attributes
    • Tanimoto coefficient / Similarity of asymmetric binary attributes
  • Atrial Premature Contraction / The different types of anomalies

B

  • bag of words (BoW) model / Different IR algorithms you will learn
  • baseline predictors
    • about / Baseline predictors
    • code / Code walkthrough
  • basic user-user collaborative filtering
    • implementing, F# used / Implementing basic user-user collaborative filtering using F#
  • binary classification
    • k-NN, using / Binary classification using k-NN
    • logistic regression, using / Binary classification using logistic regression (using Accord.NET)

C

  • Chi-squared statistic
    • used, for determining anomalies / Chi-squared statistic to determine anomalies
  • classification algorithms
    • types / Different classification algorithms you will learn
    • about / Different classification algorithms you will learn
    / Different classification algorithms
  • clustering / Different areas where machine learning is being used
  • Cold Start
    • about / Baseline predictors
  • collaborative filtering
    • about / Vocabulary of collaborative filtering
    • User-User collaborative filtering / Vocabulary of collaborative filtering
    • Item-Item collaborative filtering / Vocabulary of collaborative filtering
  • Collaborative filtering / Recommender systems
  • collective anomalies
    • about / The different types of anomalies
  • color images
    • grouping / Grouping/clustering color images based on Canberra distance
    • clustering / Grouping/clustering color images based on Canberra distance
  • confusion matrix
    • about / Confusion matrix (decision support)
  • contextual anomalies
    • about / The different types of anomalies
    • contextual attributes / The different types of anomalies
    • behavioral attributes / The different types of anomalies
  • countBy / Generating a PDF from a histogram

D

  • decision tree
    • used, for multiclass classification / Multiclass classification using decision trees
    • working / How does it work?
    • used, for predicting traffic jam / Predicting a traffic jam using a decision tree: a case study
  • decision tree algorithm
    • about / Decision tree algorithms
    • linear regression / Linear regression
    • logistic regression / Logistic regression
    • recommender systems / Recommender systems
  • Deedle
    • URL / Why use F#?
  • density estimation
    • used, for detecting anomalies / Detecting anomalies using density estimation
  • Dew point / Putting it together with Math.NET and FsPlot
  • distance function / How does this work?
  • distance metrics
    • example usages / Some example usages of distance metrics

E

  • Ensemble method / Summary
  • example usages, distance metrics
    • about / Some example usages of distance metrics
    • asymmetric binary similarity measures, using / Finding similar cookies using asymmetric binary similarity measures

F

  • F#
    • about / Why use F#?
    • benefits / Why use F#?
    • type providers / Why use F#?
    • supervised learning / Supervised machine learning
    • used, for searching linear regression coefficients / Finding linear regression coefficients using F#
    • used, for implementing basic user-user collaborative filtering / Implementing basic user-user collaborative filtering using F#
  • F# 3.7.0
    • Math.NET Numerics / Math.NET Numerics for F# 3.7.0
  • F# wrapp / Getting Math.NET
  • feature
    • scaling / Feature scaling
  • frameworks, machine learning
    • Accord.NET / Machine learning frameworks
    • WekaSharp / Machine learning frameworks
  • FsPlot
    • about / APIs used
    • URL / APIs used
    • used, for generating linear regression coefficients / Putting it together with Math.NET and FsPlot

G

  • gap calculations
    • variations / Variations of gap calculations and similarity measures
  • Grubb's test
    • used, for detecting point anomalies / Detecting point anomalies using Grubb's test
    • used, for transforming multivariate data / Grubb's test for multivariate data using Mahalanobis distance
    • covariance matrix / Code walkthrough

H

  • handwritten digits
    • recognizing / Recognizing handwritten digits – your "Hello World" ML program
    • working / How does this work?
  • HighCharts / Finding linear regression coefficients using F#
  • histogram
    • pdf, generating / Generating a PDF from a histogram

I

  • Inner Product family
    • about / Inner Product family
    • Inner-product distance / Inner Product family
    • Harmonic distance / Inner Product family
    • Cosine Similarity distance measure / Inner Product family
    • Dice coefficient / Inner Product family
  • Inter Quartile Range (IQR)
    • used, for detecting point anomalies / Detecting point anomalies using IQR (Interquartile Range)
    • about / Detecting point anomalies using IQR (Interquartile Range)
  • Intersection family
    • about / Intersection family
    • Intersection distance / Intersection family
    • Wave Hedges distance / Intersection family
    • Czekanowski distance / Intersection family
    • Ruzicka distance / Intersection family
  • inverse document frequency (idf) / Information retrieval using tf-idf
  • IR
    • about / Objective
    • algorithms / Different IR algorithms you will learn
    • tf-idf, using / Information retrieval using tf-idf
    • similarity measures / Measures of similarity
  • IR algorithms
    • distance based / Different IR algorithms you will learn
    • set based / Different IR algorithms you will learn
  • IR distance
    • using / What interesting things can you do?
  • iris flowers
    • Iris-versicolor / Multiclass classification using logistic regression
    • Iris-setosa / Multiclass classification using logistic regression
    • Iris-virginica / Multiclass classification using logistic regression
  • item-item collaborative filtering
    • about / Item-item collaborative filtering

K

  • k-Nearest Neighbor (k-NN algorithm)
    • about / Nearest Neighbour algorithm (a.k.a k-NN algorithm)
    • reference, URLs / Nearest Neighbour algorithm (a.k.a k-NN algorithm)
  • k-NN
    • used, for binary classification / Binary classification using k-NN, How does it work?
    • working / How does it work?
    • used, for finding cancerous cells / Finding cancerous cells using k-NN: a case study
  • Kaggle
    • about / Machine learning for fun and profit
    • URL / Recognizing handwritten digits – your "Hello World" ML program

L

  • L1 family
    • about / L1 family
    • Sørensen / L1 family
    • Gower distance / L1 family
    • Soergel / L1 family
    • kulczynski d / L1 family
    • kulczynski s / L1 family
    • Canberra distance / L1 family
  • least square
    • linear regression method / Linear regression method of least square
  • linear regression
    • algorithms, types / Different types of linear regression algorithms
    • APIs / APIs used
  • linear regression coefficients
    • searching, with F# / Finding linear regression coefficients using F#
    • searching, with Math.NET / Finding the linear regression coefficients using Math.NET
  • logistic regression
    • about / Understanding logistic regression
    • sigmoid function chart / The sigmoid function chart
    • used, for binary classification / Binary classification using logistic regression (using Accord.NET)
    • used, for multiclass classification / Multiclass classification using logistic regression

M

  • machine learning
    • overview / Objective
    • URL / Getting in touch
    • using, areas / Different areas where machine learning is being used
    • frameworks / Machine learning frameworks
    • Kaggle / Machine learning for fun and profit
    • using / Some interesting things you can do
  • Mahalanobis distance
    • Grubb's test,used for transforming multivariate data / Grubb's test for multivariate data using Mahalanobis distance
  • Math.NET
    • about / Objective
    • used, for searching linear regression coefficients / Finding the linear regression coefficients using Math.NET
    • used, for generating linear regression coefficients / Putting it together with Math.NET and FsPlot
    • using, for multiple linear regression / Multiple linear regression and variations using Math.NET
  • Math.NET Numerics
    • about / Math.NET Numerics for F# 3.7.0
    • obtaining / Getting Math.NET
    • using / Experimenting with Math.NET
  • matrix
    • about / The basics of matrices and vectors (a short and sweet refresher)
    • creating / Creating a matrix
    • creating, by hand / Creating a matrix
    • creating, from list of rows / Creating a matrix
    • transpose, finding / Finding the transpose of a matrix
    • inverse, finding / Finding the inverse of a matrix
    • trace / Trace of a matrix
    • QR decomposition / QR decomposition of a matrix
    • Single Value Decomposition (SVD) / SVD of a matrix
  • Minkowski distance
    • about / Minkowski family
    • Euclidean distance / Minkowski family
    • City block distance / Minkowski family
    • Chebyshev distance / Minkowski family
  • Movie Lens 100K dataset
    • reference link / Working with real movie review data (Movie Lens)
  • movie ratings dataset, u.data file
    • reference link / Working with real movie review data (Movie Lens)
  • multiclass classification
    • logistic regression, using / Multiclass classification using logistic regression
    • working / How does it work?
    • decision trees, using / Multiclass classification using decision trees
    • WekaSharp, using / Obtaining and using WekaSharp
    • WekaSharp, obtaining / Obtaining and using WekaSharp
  • multiple linear regression
    • about / Multiple linear regression
    • Math.NET, using / Multiple linear regression and variations using Math.NET
    • and variation / Multiple linear regression and variations using Math.NET
    • result, plotting / Plotting the result of multiple linear regression
  • multivariate data
    • transforming, with Grubb's test / Grubb's test for multivariate data using Mahalanobis distance
  • multivariate multiple linear regression
    • about / Multivariate multiple linear regression

N

  • negations
    • handling / Handling negations
  • NuGet page
    • API, URL / Math.NET Numerics for F# 3.7.0

P

  • pdf
    • generating, from histogram / Generating a PDF from a histogram
  • Pearson's correlation coefficient / Basis of User-User collaborative filtering
  • point anomalies
    • about / The different types of anomalies
    • detecting, Inter Quartile Range (IQR) used / Detecting point anomalies using IQR (Interquartile Range)
    • detecting, with Grubb's test / Detecting point anomalies using Grubb's test
  • prediction-rating correlation
    • about / Prediction-rating correlation
  • probability distribution functions (pdf) / Measures of similarity

R

  • real movie review data (Movie Lens)
    • working with / Working with real movie review data (Movie Lens)
  • recommendations
    • evaluating / Evaluating recommendations
  • Recommender systems / Different areas where machine learning is being used
  • Reinforcement Learning / Different areas where machine learning is being used
  • Relative Humidity (RH) / Putting it together with Math.NET and FsPlot
  • ridge regression
    • about / Ridge regression

S

  • Semantic Orientation (SO)
    • about / Identifying praise or criticism with sentiment orientation
    • used, for identifying praise / Identifying praise or criticism with sentiment orientation, Pointwise Mutual Information
    • used, for identifying criticism / Identifying praise or criticism with sentiment orientation, Pointwise Mutual Information
  • Sentiment Analysis (SA)
    • finding, SO-PMI used / Using SO-PMI to find sentiment analysis
  • Sentiment Analysis algorithms
    • about / A baseline algorithm for SA using SentiWordNet lexicons
  • SentiWordNet
    • download link / A baseline algorithm for SA using SentiWordNet lexicons
  • SentiWordNet lexicons
    • about / A baseline algorithm for SA using SentiWordNet lexicons
  • set based similarity measures, Shannon’s Entropy family
    • Jaccard index / Set-based similarity measures
    • Tversky index / Set-based similarity measures
  • Shannon’s Entropy family
    • Kulback Leibler’s distance measure / Shannon's Entropy family
    • Jeffrey’s distance measure / Shannon's Entropy family
    • k- Divergencedistance measure / Shannon's Entropy family
    • Topose distance measure / Shannon's Entropy family
    • Jensen Shanon distance measure / Shannon's Entropy family
    • Taneja distance measure / Combinations
    • Kumar Johnson distance measure / Combinations
    • set based similarity measures / Set-based similarity measures
  • Sigmoid function chart / The sigmoid function chart
  • similarity measures
    • about / Variations of gap calculations and similarity measures
  • SO-PMI
    • used, for finding sentiment analysis / Using SO-PMI to find sentiment analysis
  • spam data
    • URL / Challenge yourself!
  • squared-chord family (Fidelity family)
    • about / Fidelity family or squared-chord family
    • Fidelity Distance measure / Fidelity family or squared-chord family
    • Bhattacharya distance measure / Fidelity family or squared-chord family
    • Hellinger distance measure / Fidelity family or squared-chord family
    • Matusita distance measure / Fidelity family or squared-chord family
    • Squared Chord distance measure / Fidelity family or squared-chord family
  • Squared L2 family
    • about / Squared L2 family
    • Squared Euclidean distance measure / Squared L2 family
    • Squared Chi distance measure / Squared L2 family
    • Pearson’s Chi distance measure / Squared L2 family
    • Neyman’s Chi distance measure / Squared L2 family
    • Probabilistic Symmetric Chi distance measure / Squared L2 family
    • Divergence measure / Squared L2 family
    • Clark’s distance measure / Squared L2 family
    • Additive Symmetric Chi / Squared L2 family
  • Sum of Squared Error (SSE) / Linear regression method of least square
  • supervised learning
    • about / Different areas where machine learning is being used, Supervised machine learning
    • classification problem / Supervised machine learning
    • regression problem / Supervised machine learning
    • training / Training and test dataset/corpus
    • training dataset / Training and test dataset/corpus
    • training corpus / Training and test dataset/corpus
    • test dataset / Training and test dataset/corpus
    • test corpus / Training and test dataset/corpus
    • training data / Training and test dataset/corpus
    • test data / Training and test dataset/corpus
    • real life examples / Some motivating real life examples of supervised learning
    • k-Nearest Neighbor (k-NN) / Nearest Neighbour algorithm (a.k.a k-NN algorithm)
    • distance metrics / Distance metrics
    • decision tree algorithm / Decision tree algorithms

T

  • term frequency (tf) / Information retrieval using tf-idf
  • term frequency inverse document frequency (tf-idf)
    • used, for retrieving information / Information retrieval using tf-idf
    • about / Information retrieval using tf-idf
  • top-N recommendations
    • about / Top-N recommendations
  • train.csv
    • URL / Recognizing handwritten digits – your "Hello World" ML program
  • types, anomaly detection
    • point anomalies / The different types of anomalies
    • contextual anomalies / The different types of anomalies
    • collective anomalies / The different types of anomalies

U

  • unsupervised learning
    • about / Different areas where machine learning is being used, Unsupervised learning
    • features / Unsupervised learning
  • User-User collaborative filtering
    • about / Vocabulary of collaborative filtering
    • basics / Basis of User-User collaborative filtering
  • User k-Nearest Neighbors
    • about / Vocabulary of collaborative filtering

V

  • vectors
    • about / The basics of matrices and vectors (a short and sweet refresher)
    • creating / Creating a vector

W

  • weighted linear regression
    • about / Weighted linear regression
  • WeightedRegression class / Weighted linear regression
  • Weka
    • URL / Multiclass classification using decision trees
  • WekaSharp
    • URL / Machine learning frameworks, Obtaining and using WekaSharp
    • using / Obtaining and using WekaSharp
    • obtaining / Obtaining and using WekaSharp