Book Image

Mastering Scala Machine Learning

By : Kozlov
Book Image

Mastering Scala Machine Learning

By: Kozlov

Overview of this book

Since the advent of object-oriented programming, new technologies related to Big Data are constantly popping up on the market. One such technology is Scala, which is considered to be a successor to Java in the area of Big Data by many, like Java was to C/C++ in the area of distributed programing. This book aims to take your knowledge to next level and help you impart that knowledge to build advanced applications such as social media mining, intelligent news portals, and more. After a quick refresher on functional programming concepts using REPL, you will see some practical examples of setting up the development environment and tinkering with data. We will then explore working with Spark and MLlib using k-means and decision trees. Most of the data that we produce today is unstructured and raw, and you will learn to tackle this type of data with advanced topics such as regression, classification, integration, and working with graph algorithms. Finally, you will discover at how to use Scala to perform complex concept analysis, to monitor model performance, and to build a model repository. By the end of this book, you will have gained expertise in performing Scala machine learning and will be able to build complex machine learning projects using Scala.
Table of Contents (12 chapters)
10
10. Advanced Model Monitoring
11
Index

Using word2vec to find word relationships

Word2vec has been developed by Tomas Mikolov at Google, around 2012. The original idea behind word2vec was to demonstrate that one might improve efficiency by trading the model's complexity for efficiency. Instead of representing a document as bags of words, word2vec takes each word context into account by trying to analyze n-grams or skip-grams (a set of surrounding tokens with potential the token in question skipped). The words and word contexts themselves are represented by an array of floats/doubles Using word2vec to find word relationships. The objective function is to maximize log likelihood:

Using word2vec to find word relationships

Where:

Using word2vec to find word relationships

By choosing the optimal Using word2vec to find word relationships and to get a comprehensive word representation (also called map optimization). Similar words are found based on cosine similarity metric (dot product) of Using word2vec to find word relationships. Spark implementation uses hierarchical softmax, which reduces the complexity of computing the conditional probability to Using word2vec to find word relationships, or log of the vocabulary size V, as opposed to Using word2vec to find word relationships, or proportional to V. The training...