Book Image

Machine Learning with Scala Quick Start Guide

By : Md. Rezaul Karim, Ajay Kumar N
Book Image

Machine Learning with Scala Quick Start Guide

By: Md. Rezaul Karim, Ajay Kumar N

Overview of this book

Scala is a highly scalable integration of object-oriented nature and functional programming concepts that make it easy to build scalable and complex big data applications. This book is a handy guide for machine learning developers and data scientists who want to develop and train effective machine learning models in Scala. The book starts with an introduction to machine learning, while covering deep learning and machine learning basics. It then explains how to use Scala-based ML libraries to solve classification and regression problems using linear regression, generalized linear regression, logistic regression, support vector machine, and Naïve Bayes algorithms. It also covers tree-based ensemble techniques for solving both classification and regression problems. Moving ahead, it covers unsupervised learning techniques, such as dimensionality reduction, clustering, and recommender systems. Finally, it provides a brief overview of deep learning using a real-life example in Scala.
Table of Contents (9 chapters)

LR for churn prediction

LR is an algorithm for classification, which predicts a binary response. It is similar to linear regression, which we described in Chapter 2, Scala for Regression Analysis, except that it does not predict continuous values—it predicts discrete classes. The loss function is the sigmoid function (or logistic function):

Similar to linear regression, the intuition behind the cost function is to penalize models that have large errors between the real response and the predicted response:

For a given new data point, x, the LR model makes predictions using the following equation:

In the preceding equation, the logistic function is applied to the regression to get the probabilities of it belonging in either class, where z = wT x and if f(wT x) > 0.5, the outcome is positive; otherwise, it is negative. This means that the threshold for the classification...