Book Image

Machine Learning with Scala Quick Start Guide

By : Md. Rezaul Karim, Ajay Kumar N
Book Image

Machine Learning with Scala Quick Start Guide

By: Md. Rezaul Karim, Ajay Kumar N

Overview of this book

Scala is a highly scalable integration of object-oriented nature and functional programming concepts that make it easy to build scalable and complex big data applications. This book is a handy guide for machine learning developers and data scientists who want to develop and train effective machine learning models in Scala. The book starts with an introduction to machine learning, while covering deep learning and machine learning basics. It then explains how to use Scala-based ML libraries to solve classification and regression problems using linear regression, generalized linear regression, logistic regression, support vector machine, and Naïve Bayes algorithms. It also covers tree-based ensemble techniques for solving both classification and regression problems. Moving ahead, it covers unsupervised learning techniques, such as dimensionality reduction, clustering, and recommender systems. Finally, it provides a brief overview of deep learning using a real-life example in Scala.
Table of Contents (9 chapters)

Random forest for supervised learning

In this section, we'll see how to use RF to solve both regression and classification problems. We'll use DT implementation from the Spark ML package in Scala. Although both GBT and RF are ensembles of trees, the training processes are different. For instance, RF uses the bagging technique to perform the example, while GBT uses boosting. Nevertheless, there are several practical trade-offs between both the ensembles that can pose a dilemma about what to choose. However, RF would be the winner in most of the cases. Here are some justifications:

  • GBTs train one tree at a time, but RF can train multiple trees in parallel. So the training time is lower with RF. However, in some special cases, training and using a smaller number of trees with GBTs is faster and more convenient.
  • RFs are less prone to overfitting. In other words, RFs reduces...