Book Image

Scala Machine Learning Projects

Book Image

Scala Machine Learning Projects

Overview of this book

Machine learning has had a huge impact on academia and industry by turning data into actionable information. Scala has seen a steady rise in adoption over the past few years, especially in the fields of data science and analytics. This book is for data scientists, data engineers, and deep learning enthusiasts who have a background in complex numerical computing and want to know more hands-on machine learning application development. If you're well versed in machine learning concepts and want to expand your knowledge by delving into the practical implementation of these concepts using the power of Scala, then this book is what you need! Through 11 end-to-end projects, you will be acquainted with popular machine learning libraries such as Spark ML, H2O, DeepLearning4j, and MXNet. At the end, you will be able to use numerical computing and functional programming to carry out complex numerical tasks to develop, build, and deploy research or commercial projects in a production-ready environment.

Title Page

Packt Upsell

Contributors

Preface

Free Chapter

Analyzing Insurance Severity Claims

Analyzing Insurance Severity Claims

Machine learning and learning workflow

Hyperparameter tuning and cross-validation

Analyzing and predicting insurance severity claims

LR for predicting insurance severity claims

GBT regressor for predicting insurance severity claims

Boosting the performance using random forest regressor

Comparative analysis and model deployment

Analyzing and Predicting Telecommunication Churn

Analyzing and Predicting Telecommunication Churn

Why do we perform churn analysis, and how do we do it?

Developing a churn analytics pipeline

LR for churn prediction

SVM for churn prediction

DTs for churn prediction

Random Forest for churn prediction

Selecting the best model for deployment

High Frequency Bitcoin Price Prediction from Historical and Live Data

High Frequency Bitcoin Price Prediction from Historical and Live Data

Bitcoin, cryptocurrency, and online trading

High-level data pipeline of the prototype

Historical and live-price data collection

Model training for prediction

Scala Play web service

Predicting prices and evaluating the model

Demo prediction using Scala Play framework

Population-Scale Clustering and Ethnicity Prediction

Population-Scale Clustering and Ethnicity Prediction

Population scale clustering and geographic ethnicity

1000 Genomes Projects dataset description

Algorithms, tools, and techniques

Configuring programming environment

Data pre-processing and feature engineering

Topic Modeling - A Better Insight into Large-Scale Texts

Topic Modeling - A Better Insight into Large-Scale Texts

Topic modeling and text clustering

Topic modeling with Spark MLlib and Stanford NLP

Other topic models versus the scalability of LDA

Deploying the trained LDA model

Developing Model-based Movie Recommendation Engines

Developing Model-based Movie Recommendation Engines

Recommendation system

Spark-based movie recommendation systems

Selecting and deploying the best model

Options Trading Using Q-learning and Scala Play Framework

Options Trading Using Q-learning and Scala Play Framework

Reinforcement versus supervised and unsupervised learning

A simple Q-learning implementation

Developing an options trading web app using Q-learning

Clients Subscription Assessment for Bank Telemarketing using Deep Neural Networks

Clients Subscription Assessment for Bank Telemarketing using Deep Neural Networks

Client subscription assessment through telemarketing

Fraud Analytics Using Autoencoders and Anomaly Detection

Fraud Analytics Using Autoencoders and Anomaly Detection

Outlier and anomaly detection

Autoencoders and unsupervised learning

Developing a fraud analytics model

Hyperparameter tuning and feature selection

Human Activity Recognition using Recurrent Neural Networks

Human Activity Recognition using Recurrent Neural Networks

Working with RNNs

Human activity recognition using the LSTM model

Implementing an LSTM model for HAR

Tuning LSTM hyperparameters and GRU

Image Classification using Convolutional Neural Networks

Image Classification using Convolutional Neural Networks

Image classification and drawbacks of DNNs

CNN architecture

Large-scale image classification using CNN

Tuning and optimizing CNN hyperparameters

Other Books You May Enjoy

Other Books You May Enjoy

Leave a review - let other readers know what you think

Index

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Index

A

action-value / The policy and action-value
activities of daily living (ADL) / Human activity recognition using the LSTM model
ADAM
- used, for large-scale genomics data processing / ADAM for large-scale genomics data processing
Akka / Scala Play web service
Akka actors
- concurrency / Concurrency through Akka actors
Allele Count (AC) / 1000 Genomes Projects dataset description
allele frequency (AF) / 1000 Genomes Projects dataset description
Allele Number (AN) / 1000 Genomes Projects dataset description
alt-coins / Bitcoin, cryptocurrency, and online trading
Alternating Least Squares (ALS) algorithm / Model-based collaborative filtering, Model-based recommendation with Spark
anomaly detection / Outlier and anomaly detection
Anorm / Scala Play web service
Apache Zeppelin
- installing / Installing and getting started with Apache Zeppelin
- using / Installing and getting started with Apache Zeppelin
- URL / Installing and getting started with Apache Zeppelin, Creating notebooks
- sources, building / Building from the source
- starting / Starting and stopping Apache Zeppelin
- stopping / Starting and stopping Apache Zeppelin
- notebooks, creating / Creating notebooks
area under the curve (AUC) / Step 10 - Model evaluation on the highly-imbalanced data
Area Under the Precision-Recall Curve (AUPRC) / Problem description
autoencoder
- using, for unsupervised learning / Autoencoders and unsupervised learning
- working principles / Working principles of an autoencoder
- data representation / Efficient data representation with autoencoders

B

base pairs (bp) / 1000 Genomes Projects dataset description
best fit line / LR for predicting insurance severity claims
Bitcoin
- about / Bitcoin, cryptocurrency, and online trading
- state-of-the-art automated trading / State-of-the-art automated trading of Bitcoin
- training / Training
- prediction / Prediction
black box / Assumptions and design choices
Boltzmann machines / Efficient data representation with autoencoders

C

churn analytics pipeline
- developing / Developing a churn analytics pipeline
- dataset, description / Description of the dataset
- exploratory analysis / Exploratory analysis and feature engineering
- feature engineering / Exploratory analysis and feature engineering
churn prediction
- with LR / LR for churn prediction
- with SVM / SVM for churn prediction
- with DTs / DTs for churn prediction
- with RF / Random Forest for churn prediction
client subscription assessment
- through telemarketing / Client subscription assessment through telemarketing
- dataset description / Dataset description
- Apache Zeppelin, installing / Installing and getting started with Apache Zeppelin
- Apache Zeppelin, using / Installing and getting started with Apache Zeppelin
- exploratory analysis, of dataset / Exploratory analysis of the dataset
- numeric features, statistics / Statistics of numeric features
- implementing / Implementing a client subscription assessment model
- hyperparameter tuning / Hyperparameter tuning and feature selection
- feature selection / Hyperparameter tuning and feature selection
client subscription assessment, hyperparameter tuning
- hidden layers / Number of hidden layers
- number of neurons, per hidden layer / Number of neurons per hidden layer
- activation functions / Activation functions
- weight, initialization / Weight and bias initialization
- bias, initialization / Weight and bias initialization
- regularization / Regularization
cluster
- URL / Spark-based model deployment for large-scale dataset
cluster managers
- standalone / Spark-based model deployment for large-scale dataset
- Apache Mesos / Spark-based model deployment for large-scale dataset
- Hadoop YARN / Spark-based model deployment for large-scale dataset
- kubernetes / Spark-based model deployment for large-scale dataset
CNN architecture
- about / CNN architecture
- convolutional operations / Convolutional operations
- pooling layer / Pooling layer and padding operations
- padding operations / Pooling layer and padding operations
- operations, subsampling / Subsampling operations
- convolutional operations, in DL4j / Convolutional and subsampling operations in DL4j, Convolutional and subsampling operations in DL4j
- subsampling operations, in DL4j / Convolutional and subsampling operations in DL4j, Convolutional and subsampling operations in DL4j
- DL4j, configuring / Configuring DL4j, ND4s, and ND4j
- ND4s, configuring / Configuring DL4j, ND4s, and ND4j
- ND4j, configuring / Configuring DL4j, ND4s, and ND4j
- using, for image classification / Large-scale image classification using CNN
CNN hyperparameters
- tuning / Tuning and optimizing CNN hyperparameters
- optimizing / Tuning and optimizing CNN hyperparameters
- dropout / Tuning and optimizing CNN hyperparameters
- clipping / Tuning and optimizing CNN hyperparameters
- sparsity / Tuning and optimizing CNN hyperparameters
- regularization / Tuning and optimizing CNN hyperparameters
- weight transforms / Tuning and optimizing CNN hyperparameters
- probability distribution manipulation / Tuning and optimizing CNN hyperparameters
- gradient normalization / Tuning and optimizing CNN hyperparameters
collaborative filtering approaches
- about / Collaborative filtering approaches
- content-based filtering approaches / Content-based filtering approaches
- hybrid recommender systems / Hybrid recommender systems
- model-based collaborative filtering / Model-based collaborative filtering
collaborative filtering approaches, problems
- cold start / Collaborative filtering approaches
- scalability / Collaborative filtering approaches
- sparsity / Collaborative filtering approaches
comparative analysis
- in production / Comparative analysis and model deployment, Spark-based model deployment for large-scale dataset
Computational Intelligence and Data Mining (CIDM) / Description of the dataset and using linear models
concurrency
- through Akka actors / Concurrency through Akka actors
convolutional operations / Convolutional operations
Coordinated Universal Time (UTC) / Data exploration
cross-validation
- about / Hyperparameter tuning and cross-validation
- exhaustive cross-validation / Hyperparameter tuning and cross-validation
- non-exhaustive cross-validation / Hyperparameter tuning and cross-validation
Cryptocompare API
- URL / Real-time data through the Cryptocompare API
- real-time data, collecting / Real-time data through the Cryptocompare API
cryptocurrency / Bitcoin, cryptocurrency, and online trading
curse of dimensionality / Efficient data representation with autoencoders
customer attrition / Why do we perform churn analysis, and how do we do it?
customer churn
- performing / Why do we perform churn analysis, and how do we do it?

D

Databricks
- URL / Other topic models versus the scalability of LDA
data denoising / Working principles of an autoencoder
data pre-processing / Data pre-processing and feature engineering
decision trees (DTs)
- about / SVM for churn prediction
- using, for churn prediction / DTs for churn prediction
deep learning (DL) / Machine learning and learning workflow
Deep Neural Networks (DNNs)
- about / Machine learning and learning workflow
- drawbacks / Image classification and drawbacks of DNNs
demo prediction
- Scala play framework, using / Demo prediction using Scala Play framework
dimensionality reduction / Working principles of an autoencoder
DNNs
- using, for geographic ethnicity prediction / DNNs for geographic ethnicity prediction
dynamic programming (DP) / A simple Q-learning implementation

E

ethnicity prediction
- H2O, using / Using H2O for ethnicity prediction
- random forest, using / Using random forest for ethnicity prediction
exchange / Bitcoin, cryptocurrency, and online trading
expectation-maximization (EM) / How does LDA algorithm work?
exploratory analysis / Exploratory analysis and feature engineering
exploratory analysis of dataset, Apache Zeppelin
- about / Exploratory analysis of the dataset
- label distribution / Label distribution
- job distribution / Job distribution
- marital distribution / Marital distribution
- default distribution / Default distribution
- housing distribution / Housing distribution
- loan distribution / Loan distribution
- contact distribution / Contact distribution
- month distribution / Month distribution
- day distribution / Day distribution
- previous outcome distribution / Previous outcome distribution
- age feature / Age feature
- duration distribution / Duration distribution
- campaign distribution / Campaign distribution
- pdays distribution / Pdays distribution
- previous distribution / Previous distribution
- emp_var_rate distributions / emp_var_rate distributions
- cons_price_idx features / cons_price_idx features
- cons_conf_idx distribution / cons_conf_idx distribution
- euribor3m distribution / Euribor3m distribution

F

feature engineering / Exploratory analysis and feature engineering, Data pre-processing and feature engineering
feature maps / CNN architecture
feature selection / Hyperparameter tuning and feature selection
feature vectors / Developing a churn analytics pipeline
feed-forward / DNNs for geographic ethnicity prediction
FNR (false negative rate) / Predicting prices and evaluating the model
FPR (false positive rate) / Predicting prices and evaluating the model
fraud analytics model
- developing / Developing a fraud analytics model
- dataset, description / Description of the dataset and using linear models
- linear models, using / Description of the dataset and using linear models
- problem description / Problem description
- programming environment, preparing / Preparing programming environment
- packages, loading / Step 1 - Loading required packages and libraries
- libraries, loading / Step 1 - Loading required packages and libraries
- Spark session, creating / Step 2 - Creating a Spark session and importing implicits
- implicits, importing / Step 2 - Creating a Spark session and importing implicits
- input data, loading / Step 3 - Loading and parsing input data
- input data, parsing / Step 3 - Loading and parsing input data
- input data, exploratory analysis / Step 4 - Exploratory analysis of the input data
- H2O DataFrame, preparing / Step 5 - Preparing the H2O DataFrame
- unsupervised pre-training, with autoencoders / Step 6 - Unsupervised pre-training using autoencoder
- dimensionality reduction, with hidden layers / Step 7 - Dimensionality reduction with hidden layers
- anomaly detection / Step 8 - Anomaly detection
- pre-trained supervised model / Step 9 - Pre-trained supervised model
- model evaluation, on highly-imbalanced data / Step 10 - Model evaluation on the highly-imbalanced data
- Spark session, stopping / Step 11 - Stopping the Spark session and H2O context
- H2O context / Step 11 - Stopping the Spark session and H2O context
- auxiliary classes / Auxiliary classes and methods
- auxiliary methods / Auxiliary classes and methods

G

1000 Genomes projects dataset
- description / 1000 Genomes Projects dataset description
Gated Recurrent Unit (GRU) / Tuning LSTM hyperparameters and GRU
GBT regressor
- used, for predicting insurance severity claims / GBT regressor for predicting insurance severity claims
Generalized linear models (GLM) / H2O and Sparkling water
geographic ethnicity / Population scale clustering and geographic ethnicity
geographic ethnicity prediction
- DNNs, using / DNNs for geographic ethnicity prediction
Gradient boosting machine (GBM) / H2O and Sparkling water

H

H2 / Scala Play web service
H2O
- about / H2O and Sparkling water
- using, for ethnicity prediction / Using H2O for ethnicity prediction
Hadoop Distributed File System (HDFS) / Spark-based model deployment for large-scale dataset
hailstone sequence / Efficient data representation with autoencoders
hidden layers / DNNs for geographic ethnicity prediction
Hierarchical Drichilet Process (HDP) algorithms / Other topic models versus the scalability of LDA
high-level data pipeline
- of prototype / High-level data pipeline of the prototype
HistoMinute
- URL / Real-time data through the Cryptocompare API
historical data collection
- about / Historical data collection
- URL / Historical data collection
- transforming, into time series / Transformation of historical data into a time series
- assumptions / Assumptions and design choices
- design / Assumptions and design choices
- data preprocessing / Data preprocessing
Human Activity Recognition (HAR)
- LSTM model, using / Human activity recognition using the LSTM model
- dataset description / Dataset description
- MXNet, setting for Scala / Setting and configuring MXNet for Scala
- MXNet, configuring for Scala / Setting and configuring MXNet for Scala
- LSTM model, implementing / Implementing an LSTM model for HAR
hyperparameters / Developing insurance severity claims predictive model using LR
hyperparameter tuning / Hyperparameter tuning and cross-validation, Model training and hyperparameter tuning, Hyperparameter tuning and feature selection

I

image classification
- about / Image classification and drawbacks of DNNs
- CNN, using / Large-scale image classification using CNN
- problem description / Problem description
- dataset, description / Description of the image dataset
- workflow / Workflow of the overall project
- CNNs, implementing / Implementing CNNs for image classification
- image processing / Image processing
- image metadata, extracting / Extracting image metadata
- image feature, extraction / Image feature extraction
- ND4j dataset, preparing / Preparing the ND4j dataset
- CNNs, training / Training the CNNs and saving the trained models
- trained models, saving / Training the CNNs and saving the trained models
- model, evaluating / Evaluating the model
- main() method, executing / Wrapping up by executing the main() method
IMDb database
- URL / Data exploration
insurance severity claims
- analyzing / Analyzing and predicting insurance severity claims
- predicting / Analyzing and predicting insurance severity claims
- motivation / Motivation
- dataset, description / Description of the dataset
- exploratory analysis, of dataset / Exploratory analysis of the dataset
- data preprocessing / Data preprocessing
- predicting, with LR / LR for predicting insurance severity claims, Developing insurance severity claims predictive model using LR
- predicting, with GBT regressor / GBT regressor for predicting insurance severity claims
- performance, boosting with RF regressor / Boosting the performance using random forest regressor
- comparative analysis, in production / Comparative analysis and model deployment
- model deployment, in production / Comparative analysis and model deployment
- Spark-based model deployment, for large-scale dataset / Spark-based model deployment for large-scale dataset
Inter-Quartile Range (IQR) / Outlier and anomaly detection
Interactive Intelligent Systems (TiiS) / Item-based collaborative filtering for movie similarity
item-based collaborative filtering, for movie similarity
- libraries, importing / Step 1 - Importing necessary libraries and creating a Spark session
- Spark session, creating / Step 1 - Importing necessary libraries and creating a Spark session
- dataset, reading / Step 2 - Reading and parsing the dataset
- dataset, parsing / Step 2 - Reading and parsing the dataset
- similarity, computing / Step 3 - Computing similarity
- model, testing / Step 4 - Testing the model

K

K-means
- working / How does K-means work?
Kaggle / Description of the image dataset
KYC (Know Your Customer) / Bitcoin, cryptocurrency, and online trading

L

Latent Dirichlet Allocation (LDA)
- working / How does LDA algorithm work?
latent factors (LFs) / Model-based collaborative filtering
likelihood measurement
- URL / Step 8 - Measuring the likelihood of two documents
linear regression (LR)
- used, for predicting insurance severity claims / LR for predicting insurance severity claims, Developing insurance severity claims predictive model using LR
- used, for predicting severity claims / Developing insurance severity claims predictive model using LR
- using, for churn prediction / LR for churn prediction
- used, for churn prediction / LR for churn prediction
linear threshold units (LTUs) / DNNs for geographic ethnicity prediction
live-price data collection
- through Cryptocompare API / Real-time data through the Cryptocompare API
logistic regression (LR) / Exploratory analysis and feature engineering
Long Short-Term Memory cells (LSTMs)
- implementing, for HAR / Implementing an LSTM model for HAR
LSTM hyperparameters
- tuning / Tuning LSTM hyperparameters and GRU
LSTM model, implementing for HAR
- packages, importing / Step 1 - Importing necessary libraries and packages
- libraries, importing / Step 1 - Importing necessary libraries and packages
- MXNet context, creating / Step 2 - Creating MXNet context
- test set, parsing / Step 3 - Loading and parsing the training and test set
- test set, loading / Step 3 - Loading and parsing the training and test set
- training set, loading / Step 3 - Loading and parsing the training and test set
- training set, parsing / Step 3 - Loading and parsing the training and test set
- exploratory analysis, of dataset / Step 4 - Exploratory analysis of the dataset
- internal RNN structure, defining / Step 5 - Defining internal RNN structure and LSTM hyperparameters
- LSTM hyperparameters / Step 5 - Defining internal RNN structure and LSTM hyperparameters
- LSTM network construction / Step 6 - LSTM network construction
- optimizer, setting up / Step 7 - Setting up an optimizer
- LSTM network, training / Step 8 - Training the LSTM network
- model, evaluating / Step 9 - Evaluating the model
LSTM networks / LSTM networks

M

machine learning (ML)
- about / Machine learning and learning workflow, State-of-the-art automated trading of Bitcoin
- workflow / Typical machine learning workflow
- for genetic variants / Machine learning for genetic variants
mean square error (MSE) / Outlier and anomaly detection
ML-based ALS models
- URL / Step 8 - Evaluating the model
model
- selecting / Selecting the best model for deployment, Selecting and deploying the best model
- evaluating / Predicting prices and evaluating the model
model-based recommendation
- with Spark / Model-based recommendation with Spark
- data exploration / Data exploration
- movie recommendation, with ALS / Movie recommendation using ALS
- packages, importing / Step 1 - Import packages, load, parse, and explore the movie and rating dataset
- movie, parsing / Step 1 - Import packages, load, parse, and explore the movie and rating dataset
- movie, loading / Step 1 - Import packages, load, parse, and explore the movie and rating dataset
- movie, exploring / Step 1 - Import packages, load, parse, and explore the movie and rating dataset
- rating dataset / Step 1 - Import packages, load, parse, and explore the movie and rating dataset
- DataFrames, registering as temp tables / Step 2 - Register both DataFrames as temp tables to make querying easier
- statistics, exploring / Step 3 - Explore and query for related statistics
- statistics, querying / Step 3 - Explore and query for related statistics
- training data, preparing / Step 4 - Prepare training and test rating data and check the counts
- test rating data, preparing / Step 4 - Prepare training and test rating data and check the counts
- counts, checking / Step 4 - Prepare training and test rating data and check the counts
- data preparation, for building recommendation model with ALS / Step 5 - Prepare the data for building the recommendation model using ALS
- ALS user product matrix, building / Step 6 - Build an ALS user product matrix
- predictions, creating / Step 7 - Making predictions
- model, evaluating / Step 8 - Evaluating the model
model deployment
- in production / Comparative analysis and model deployment, Spark-based model deployment for large-scale dataset
- about / Selecting the best model for deployment
/ Selecting and deploying the best model
model training
- using, for prediction / Model training for prediction
- about / Model training and hyperparameter tuning
movie database
- URL / Data exploration
MovieLens 100k rating dataset
- URL / Item-based collaborative filtering for movie similarity
Multilayer Perceptron (MLP) / DNNs for geographic ethnicity prediction, Working principles of an autoencoder
MXNet
- setting, for Scala / Setting and configuring MXNet for Scala
- configuring, for Scala / Setting and configuring MXNet for Scala

N

Next-generation genome sequencing (NGS) / Population scale clustering and geographic ethnicity

O

online trading / Bitcoin, cryptocurrency, and online trading
open-high-low-close (OHLC) / Real-time data through the Cryptocompare API
optimal clusters
- quantity, determining / Determining the number of optimal clusters
options trading web
- developing, with Q-learning / Developing an options trading web app using Q-learning
- problem description / Problem description
- implementing / Implementating an options trading web application, Putting it altogether
- option property, creating / Creating an option property
- option model, creating / Creating an option model
- model, evaluating / Evaluating the model
- wrapping up, as Scala web app / Wrapping up the options trading app as a Scala web app
- backend / The backend
- frontend / The frontend
- instructions, executing / Running and Deployment Instructions
- instructions deployment / Running and Deployment Instructions
- model deployment / Model deployment
outlier detection / Outlier and anomaly detection
output layer / DNNs for geographic ethnicity prediction

P

Pachinko Allocation Model (PAM) / Other topic models versus the scalability of LDA
padding operations / Pooling layer and padding operations
parameters / Developing insurance severity claims predictive model using LR
partial differential equations (PDE) / Problem description
persistence of memory / Contextual information and the architecture of RNNs
Personal Genome Project (PGP) / Machine learning for genetic variants
POJO (Plain Old Java Object) / SchedulerActor
policy / Policy
policy gradients / Policy
pooling layers / Pooling layer and padding operations
population-scale clustering
- about / Population scale clustering and geographic ethnicity
- Spark-based K-means, using / Spark-based K-means for population-scale clustering
prices
- predicting / Predicting prices and evaluating the model
Principal Component Analysis (PCA) / Efficient data representation with autoencoders
Probabilistic Latent Sentiment Analysis (pLSA) / Other topic models versus the scalability of LDA
programming environment
- configuring / Configuring programming environment
prototype
- high-level data pipeline / High-level data pipeline of the prototype

Q

Q-learning
- implementation / A simple Q-learning implementation
- components / Components of the Q-learning algorithm
- Q-learning class / Components of the Q-learning algorithm
- QLConfig / Components of the Q-learning algorithm
- QLAction / Components of the Q-learning algorithm
- QLPolicy / Components of the Q-learning algorithm
- QLSpace / Components of the Q-learning algorithm
- QLState / Components of the Q-learning algorithm
- QLIndexedState / Components of the Q-learning algorithm
- QLModel / Components of the Q-learning algorithm
- states / States and actions in QLearning
- actions / States and actions in QLearning
- search space / The search space
- policy / The policy and action-value
- action-value / The policy and action-value
- model, creation / QLearning model creation and training
- model, training / QLearning model creation and training
- model, validation / QLearning model validation
- prediction, creating with trained model / Making predictions using the trained model
- used, for developing options trading web app / Developing an options trading web app using Q-learning

R

random forest (RF)
- using, for churn prediction / Random Forest for churn prediction
- using, for ethnicity prediction / Using random forest for ethnicity prediction
random policy / Policy
receiver operating characteristic (ROC) / LR for churn prediction
receptive field / CNN architecture
recommendation system
- about / Recommendation system
- collaborative filtering approaches / Collaborative filtering approaches
- utility matrix / The utility matrix
rectified linear unit (ReLU) / CNN architecture
recurrent neural network (RNN)
- working with / Working with RNNs
- contextual information / Contextual information and the architecture of RNNs
- architecture / Contextual information and the architecture of RNNs
- long-term dependency problem / RNN and the long-term dependency problem
- LSTM networks / LSTM networks
region of interest (ROI) / Image processing
regression error
- about / LR for predicting insurance severity claims
- Mean Squared Error (MSE) / LR for predicting insurance severity claims
- Root Mean Squared Error (RMSE) / LR for predicting insurance severity claims
- R-squared / LR for predicting insurance severity claims
- Mean Absolute Error (MAE) / LR for predicting insurance severity claims
- explained variance / LR for predicting insurance severity claims
reinforcement learning (RL)
- versus supervised learning / Reinforcement versus supervised and unsupervised learning
- versus unsupervised learning / Reinforcement versus supervised and unsupervised learning
- using / Using RL
- notation / Notation, policy, and utility in RL
- policy / Notation, policy, and utility in RL, Policy
- utility / Notation, policy, and utility in RL, Utility
- environment / Notation, policy, and utility in RL
- agent / Notation, policy, and utility in RL
- state / Notation, policy, and utility in RL
- goal / Notation, policy, and utility in RL
- action / Notation, policy, and utility in RL
- reward / Notation, policy, and utility in RL
- episode / Notation, policy, and utility in RL
RESTful architecture / Why RESTful architecture?
RF regressor
- used, for boosting performance / Boosting the performance using random forest regressor
- used, for classification / Random Forest for classification and regression
- used, for regression / Random Forest for classification and regression
Root Mean Squared Error (RMSE) / Step 8 - Evaluating the model
rotation estimation / Hyperparameter tuning and cross-validation

S

Scala
- MXNet, setting / Setting and configuring MXNet for Scala
- MXNet, configuring / Setting and configuring MXNet for Scala
Scala play framework
- using, for demo prediction / Demo prediction using Scala Play framework
- RESTful architecture / Why RESTful architecture?
- project structure / Project structure
- web app, executing / Running the Scala Play web app
Scala web service
- about / Scala Play web service
- concurrency, through Akka actors / Concurrency through Akka actors
- web service workflow / Web service workflow
single nucleotide polymorphisms (SNPs) / Population scale clustering and geographic ethnicity, 1000 Genomes Projects dataset description
Singular Value Decomposition (SVD) / Hybrid recommender systems
Spark-based K-means
- used, for population-scale clustering / Spark-based K-means for population-scale clustering
Spark-based movie recommendation systems
- about / Spark-based movie recommendation systems
- Item-based collaborative filtering, for movie similarity / Item-based collaborative filtering for movie similarity
- model-based recommendation, with Spark / Model-based recommendation with Spark
Sparkling water / H2O and Sparkling water
Spark ML / Scala Play web service
Spark MLlib
- using, in TM / Topic modeling with Spark MLlib and Stanford NLP
Spark ML pipelines
- DataFrame / Developing insurance severity claims predictive model using LR
- transformer / Developing insurance severity claims predictive model using LR
- estimator / Developing insurance severity claims predictive model using LR
- pipeline / Developing insurance severity claims predictive model using LR
- parameter / Developing insurance severity claims predictive model using LR
Stochastic Gradient Descent (SGD) / Tuning and optimizing CNN hyperparameters
Support Vector Machines (SVMs)
- used, for churn prediction / SVM for churn prediction

T

telemarketing
- client subscription assessment / Client subscription assessment through telemarketing
text clustering / Topic modeling and text clustering
The Cancer Genome Atlas (TCGA) / Machine learning for genetic variants
TNR (true negative rate) / Predicting prices and evaluating the model
topic modeling (TM)
- about / Topic modeling and text clustering, Step 7 - Topic modelling
- LDA, working / How does LDA algorithm work?
- with Spark MLlib / Topic modeling with Spark MLlib and Stanford NLP
- Spark session, creating / Step 1 - Creating a Spark session
- vocabulary, creating to train LDA after text pre-processing / Step 2 - Creating vocabulary and tokens count to train the LDA after text pre-processing
- tokens count, creating / Step 2 - Creating vocabulary and tokens count to train the LDA after text pre-processing
- LDA model, instantiating / Step 4 - Set the NLP optimizer
- NLP optimizer, setting / Step 4 - Set the NLP optimizer
- LDA model, training / Step 5 - Training the LDA model
- topics of interest, preparing / Step 6 - Prepare the topics of interest
- likelihood of documents, measuring / Step 8 - Measuring the likelihood of two documents
- models, versus scalability of LDA / Other topic models versus the scalability of LDA
TPR (true positive rate) / Predicting prices and evaluating the model
trails / Notation, policy, and utility in RL
trained LDA model
- deploying / Deploying the trained LDA model

U

unsupervised machine learning
- about / Unsupervised machine learning
- population genomics / Population genomics and clustering
- clustering / Population genomics and clustering
- autoencoders, using / Autoencoders and unsupervised learning
utility / Utility
utility function / Utility
utility matrix / The utility matrix

V

validation dataset / Typical machine learning workflow
Variant Call Format (VCF) / 1000 Genomes Projects dataset description

W

web service workflow
- about / Web service workflow
- JobModule / JobModule
- scheduler / Scheduler
- ScheduleActor / SchedulerActor
- PredictionActor / PredictionActor and the prediction step
- prediction / PredictionActor and the prediction step
- TradeActor / TraderActor
within-cluster sum of squares (WCSS) / How does K-means work?
Within-Set Sum of Squared Errors (WSSSE) / Spark-based K-means for population-scale clustering

Y

Yelp
- URL / Description of the image dataset