Book Image

F# for Machine Learning Essentials

By : Sudipta Mukherjee
Book Image

F# for Machine Learning Essentials

By: Sudipta Mukherjee

Overview of this book

The F# functional programming language enables developers to write simple code to solve complex problems. With F#, developers create consistent and predictable programs that are easier to test and reuse, simpler to parallelize, and are less prone to bugs. If you want to learn how to use F# to build machine learning systems, then this is the book you want. Starting with an introduction to the several categories on machine learning, you will quickly learn to implement time-tested, supervised learning algorithms. You will gradually move on to solving problems on predicting housing pricing using Regression Analysis. You will then learn to use Accord.NET to implement SVM techniques and clustering. You will also learn to build a recommender system for your e-commerce site from scratch. Finally, you will dive into advanced topics such as implementing neural network algorithms while performing sentiment analysis on your data.
Table of Contents (16 chapters)
F# for Machine Learning Essentials
Credits
Foreword
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Index

Information retrieval using tf-idf


Whenever you type some search term in your Windows search box, some documents appear matching your search term. There is a common, well-known, easy-to-implement algorithm that makes it possible to rank the documents based on the search term. Basically, the algorithm allows developers to assign some kind of score to each document in the result set. That score can be seen as a score of confidence that the system has on how much the user would like that result.

The score that this algorithm attaches with each document is a product of two different scores. The first one is called term frequency (tf) and the other one is called inverse document frequency (idf). Their product is referred to as tf-idf or "term frequency inverse document frequency".

Tf is the number of times a term occurs in a given document. Idf is the ratio between the total number of documents scanned and the number of documents in which a given search term is found. However, this ratio is not...