Book Image

Hands-on Machine Learning with JavaScript

Book Image

Hands-on Machine Learning with JavaScript

Overview of this book

In over 20 years of existence, JavaScript has been pushing beyond the boundaries of web evolution with proven existence on servers, embedded devices, Smart TVs, IoT, Smart Cars, and more. Today, with the added advantage of machine learning research and support for JS libraries, JavaScript makes your browsers smarter than ever with the ability to learn patterns and reproduce them to become a part of innovative products and applications. Hands-on Machine Learning with JavaScript presents various avenues of machine learning in a practical and objective way, and helps implement them using the JavaScript language. Predicting behaviors, analyzing feelings, grouping data, and building neural models are some of the skills you will build from this book. You will learn how to train your machine learning models and work with different kinds of data. During this journey, you will come across use cases such as face detection, spam filtering, recommendation systems, character recognition, and more. Moreover, you will learn how to work with deep neural networks and guide your applications to gain insights from data. By the end of this book, you'll have gained hands-on knowledge on evaluating and implementing the right model, along with choosing from different JS libraries, such as NaturalNode, brain, harthur, classifier, and many more to design smarter applications.
Table of Contents (14 chapters)

Tokenizing

Tokenizing is the act of transforming an input string, such as a sentence, paragraph, or even an object such as an email, into individual tokens. A very simple tokenizer might take a sentence or paragraph and split it by spaces, thus generating tokens that are individual words. However, tokens do not necessarily need to be words, nor does every word in an input string need to be returned by the tokenizer, nor does every token generated by the tokenizer need to be present in the original text, nor does a token need to represent only one word. We therefore use the term token rather than word to describe the output of a tokenizer, as tokens are not always words.

The manner in which you tokenize text before processing it with an ML algorithm has a major effect on the performance of the algorithm. Many NLP and ML applications use a bag-of-words approach, in which only the...