Book Image

Machine Learning with Scala Quick Start Guide

By : Md. Rezaul Karim, Ajay Kumar N
Book Image

Machine Learning with Scala Quick Start Guide

By: Md. Rezaul Karim, Ajay Kumar N

Overview of this book

Scala is a highly scalable integration of object-oriented nature and functional programming concepts that make it easy to build scalable and complex big data applications. This book is a handy guide for machine learning developers and data scientists who want to develop and train effective machine learning models in Scala. The book starts with an introduction to machine learning, while covering deep learning and machine learning basics. It then explains how to use Scala-based ML libraries to solve classification and regression problems using linear regression, generalized linear regression, logistic regression, support vector machine, and Naïve Bayes algorithms. It also covers tree-based ensemble techniques for solving both classification and regression problems. Moving ahead, it covers unsupervised learning techniques, such as dimensionality reduction, clustering, and recommender systems. Finally, it provides a brief overview of deep learning using a real-life example in Scala.
Table of Contents (9 chapters)

Scala for Dimensionality Reduction and Clustering

In the previous chapters, we saw several examples of supervised learning, covering both classification and regression. We performed supervised learning techniques on structured and labelled data. However, as we mentioned previously, with the rise of cloud computing, IoT, and social media, unstructured data is increasing unprecedentedly. Collectively, more than 80% of this data is unstructured and which most of them are unlabeled.

Unsupervised learning techniques, such as clustering analysis and dimensionality reduction, are two of the key applications in data-driven research and industry settings for finding hidden structures in unstructured datasets. There are many clustering algorithms being proposed for this, such as k-means, bisecting k-means, and the Gaussian mixture model. However, these algorithms cannot perform with high...