Book Image

Hands-on Machine Learning with JavaScript

Book Image

Hands-on Machine Learning with JavaScript

Overview of this book

In over 20 years of existence, JavaScript has been pushing beyond the boundaries of web evolution with proven existence on servers, embedded devices, Smart TVs, IoT, Smart Cars, and more. Today, with the added advantage of machine learning research and support for JS libraries, JavaScript makes your browsers smarter than ever with the ability to learn patterns and reproduce them to become a part of innovative products and applications. Hands-on Machine Learning with JavaScript presents various avenues of machine learning in a practical and objective way, and helps implement them using the JavaScript language. Predicting behaviors, analyzing feelings, grouping data, and building neural models are some of the skills you will build from this book. You will learn how to train your machine learning models and work with different kinds of data. During this journey, you will come across use cases such as face detection, spam filtering, recommendation systems, character recognition, and more. Moreover, you will learn how to work with deep neural networks and guide your applications to gain insights from data. By the end of this book, you'll have gained hands-on knowledge on evaluating and implementing the right model, along with choosing from different JS libraries, such as NaturalNode, brain, harthur, classifier, and many more to design smarter applications.
Table of Contents (14 chapters)

Stemming

Stemming is a type of transformation that can be applied to a single word, though typically the stemming operation occurs right after tokenizing. Stemming after tokenizing is so common that natural.js offers a tokenizeAndStem convenience method that can be attached to the String class prototype.

Specifically, stemming reduces a word to its root form, for instance by transforming running to run. Stemming your text after tokenizing can significantly reduce the entropy of your dataset, because it essentially de-duplicates words with similar meanings but different tenses or inflections. Your algorithm will not need to learn the words run, runs, running, and runnings separately, as they will all get transformed into run.

The most popular stemming algorithm, the Porter stemmer, is a heuristic algorithm that defines a number of staged rules for the transformation. But, in essence...