Machine Learning in Java

By : Bostjan Kaluza

Machine Learning in Java

By: Bostjan Kaluza

Overview of this book

As the amount of data continues to grow at an almost incomprehensible rate, being able to understand and process data is becoming a key differentiator for competitive organizations. Machine learning applications are everywhere, from self-driving cars, spam detection, document search, and trading strategies, to speech recognition. This makes machine learning well-suited to the present-day era of Big Data and Data Science. The main challenge is how to transform data into actionable knowledge. Machine Learning in Java will provide you with the techniques and tools you need to quickly gain insight from complex data. You will start by learning how to apply machine learning methods to a variety of common tasks including classification, prediction, forecasting, market basket analysis, and clustering. Moving on, you will discover how to detect anomalies and fraud, and ways to perform activity recognition, image recognition, and text analysis. By the end of the book, you will explore related web resources and technologies that will help you take your learning to the next level. By applying the most effective machine learning methods to real-world problems, you will gain hands-on experience that will transform the way you think about data.

Machine Learning in Java

Credits

About the Author

About the Reviewers

www.PacktPub.com

Preface

Free Chapter

Applied Machine Learning Quick Start

Machine learning and data science

Data and problem definition

Data collection

Data pre-processing

Unsupervised learning

Supervised learning

Generalization and evaluation

Summary

Java Libraries and Platforms for Machine Learning

The need for Java

Machine learning libraries

Building a machine learning application

Summary

Basic Algorithms – Classification, Regression, and Clustering

Summary

Customer Relationship Prediction with Ensembles

Customer relationship database

Basic naive Bayes classifier baseline

Basic modeling

Advanced modeling with ensembles

Summary

Affinity Analysis

Market basket analysis

Association rule learning

The supermarket dataset

Discover patterns

Other applications in various areas

Summary

Recommendation Engine with Apache Mahout

Basic concepts

Getting Apache Mahout

Building a recommendation engine

Content-based filtering

Summary

Fraud and Anomaly Detection

Suspicious and anomalous behavior detection

Suspicious pattern detection

Anomalous pattern detection

Fraud detection of insurance claims

Anomaly detection in website traffic

Summary

Image Recognition with Deeplearning4j

Introducing image recognition

Image classification

Summary

Activity Recognition with Mobile Phone Sensors

Introducing activity recognition

Collecting data from a mobile phone

Building a classifier

Summary

Text Mining with Mallet – Topic Modeling and Spam Detection

Introducing text mining

Installing Mallet

Working with text data

Topic modeling for BBC news

E-mail spam detection

Summary

What is Next?

Machine learning in real life

Standards and markup languages

Machine learning in the cloud

Web resources and competitions

Summary

References

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Index

A

A/B tests
- URL / Importance of evaluation
activation function
- about / Perceptron
activity recognition
- about / Introducing activity recognition
- mobile phone sensors / Mobile phone sensors
- activity-recognition pipeline / Activity recognition pipeline
- plan / The plan
AdaBoost M1 method
- about / Choosing a classification algorithm
advanced modelling
- with ensembles / Advanced modeling with ensembles
- ensembleLibrary package, using / Before we start
- data, pre-processing / Data pre-processing
- attribute selection / Attribute selection
- model selection / Model selection
- performance, evaluation / Performance evaluation
affinity analysis
- about / Affinity analysis
- cross-industry applications / Other applications in various areas
agglomerative clustering
- about / Clustering
Amazon Machine Learning / Machine learning as a service
analysis types
- about / Analysis types
- pattern analysis / Pattern analysis
- transaction analysis / Transaction analysis
Android Device Monitor
- about / Collecting training data
Android Studio
- installing / Installing Android Studio
- URL / Installing Android Studio
anomalous behaviour detection
- about / Suspicious and anomalous behavior detection
- unknown-unknowns / Unknown-unknowns
anomalous pattern detection
- about / Anomalous pattern detection
- analysis types / Analysis types
- plan recognition / Plan recognition
anomaly detection, in time series data
- about / Anomaly detection in time series data
- histogram-based anomaly detection / Histogram-based anomaly detection
- data, loading / Loading the data
- histograms, creating / Creating histograms
- density based k-nearest neighbours / Density based k-nearest neighbors
anomaly detection, in website traffic
- about / Anomaly detection in website traffic
- dataset, using / Dataset
Apache Mahout
- about / Apache Mahout
- configuring / Getting Apache Mahout
- configuring, in Eclipse with Maven plugin / Configuring Mahout in Eclipse with the Maven plugin
Apache Spark
- about / Apache Spark
- URL / Apache Spark
Application Portfolio Management (APM)
- about / IT Operations Analytics
Applied Machine Learning
- workflow / Applied machine learning workflow
Apriori
- about / Weka
Apriori algorithm
- about / Apriori algorithm
- used, for discovering shopping patterns / Apriori
artificial neural networks
- about / Artificial neural networks
association rule learning
- about / Association rule learning
- Apriori algorithm / Apriori algorithm
- FP-growth algorithm / FP-growth algorithm
association rule learning, basic concepts
- database, of transactions / Database of transactions
- itemset / Itemset and rule
- rule / Itemset and rule
- support / Support
- confidence / Confidence
autoencoder
- about / Autoencoder

B

bag-of-word (BoW)
- about / Working with text data
basic modelling
- about / Basic modeling
- models, evaluating / Evaluating models
- naive Bayes baseline, implementing / Implementing naive Bayes baseline
basic naive Bayes classifier baseline
- about / Basic naive Bayes classifier baseline
- data, obtaining / Getting the data
- data, loading / Loading the data
BBC dataset
- URL / BBC dataset
big data
- dealing with / Dealing with big data
- volume / Dealing with big data
- velocity / Dealing with big data
- variety / Dealing with big data
big data application
- architecture / Big data application architecture
BigML / Machine learning as a service
Book-Crossing dataset
- URL / Book ratings dataset
- BX-Users file / Book ratings dataset
- BX-Books file / Book ratings dataset
- BX-Book-Ratings file / Book ratings dataset
book-recommendation engine
- building / Building a recommendation engine
- book ratings dataset, using / Book ratings dataset
- data, loading / Loading the data
- data, loading from file / Loading data from file
- data, loading from database / Loading data from database
- in-memory database, creating / In-memory database
- collaborative filtering, implementing / Collaborative filtering
- custom rules, adding / Adding custom rules to recommendations
- evaluation / Evaluation
- online learning engine / Online learning engine
- content-based filtering, implementing / Content-based filtering

C

Canova library
- URL / Loading the data
Cassandra
- about / Big data application architecture
- URL / Big data application architecture
cc.mallet.pipe package
- Input2CharSequence pipeline / Pre-processing text data
- CharSequenceRemoveHTML pipeline / Pre-processing text data
- MakeAmpersandXMLFriendly pipeline / Pre-processing text data
- TokenSequenceLowercase pipeline / Pre-processing text data
- TokenSequence2FeatureSequence pipeline / Pre-processing text data
- TokenSequenceNGrams pipeline / Pre-processing text data
Chebyshev distance
- about / Java machine learning
classification
- about / Classification, Classification
- decision trees learning / Decision tree learning
- probabilistic classifiers / Probabilistic classifiers
- kernel methods / Kernel methods
- artificial neural networks / Artificial neural networks
- ensemble learning / Ensemble learning
- evaluating / Evaluating classification
- precision / Precision and recall
- recall / Precision and recall
- Roc curves / Roc curves
- data, using / Data
- data, loading / Loading data
- feature selection / Feature selection
- learning algorithms, selecting / Learning algorithms
- data, classifying / Classify new data
- evaluation / Evaluation and prediction error metrics
- prediction error metrics / Evaluation and prediction error metrics
- confusion matrix, examining / Confusion matrix
- algorithm, selecting / Choosing a classification algorithm
classification algorithms
- weka.classifiers.rules.ZeroR / Choosing a classification algorithm
- weka.classifiers.trees.RandomTree / Choosing a classification algorithm
- weka.classifiers.trees.RandomForest / Choosing a classification algorithm
- weka.classifiers.lazy.IBk / Choosing a classification algorithm
- weka.classifiers.functions.MultilayerPerceptron / Choosing a classification algorithm
- weka.classifiers.bayes.NaiveBayes / Choosing a classification algorithm
- weka.classifiers.meta.AdaBoostM1 / Choosing a classification algorithm
- weka.classifiers.meta.Bagging / Choosing a classification algorithm
classifier
- building / Building a classifier
- spurious transitions, reducing / Reducing spurious transitions
- plugging, into mobile app / Plugging the classifier into a mobile app
class implementation
- reference link / Loading the data
class unbalance / Class unbalance
clustering
- about / Clustering, Clustering
- algorithms / Clustering algorithms
- evaluation / Evaluation
clustering algorithms
- implementing / Clustering algorithms
collaborative filtering
- about / Collaborative filtering
- implementing, with book-recommendation engine / Collaborative filtering
- user-based / User-based filtering
- item-based / Item-based filtering
Comma Separated Value (CSV)
- about / Loading the data
competitions
- about / Competitions
conjugate gradient optimization algorithm
- building / Building a single-layer regression model
content-based filtering
- about / Content-based filtering
- implementing, with book-recommendation engine / Content-based filtering
Contrastive Divergence algorithm
- about / Restricted Boltzmann machine
Convolutional Neural Network (CNN)
- about / Deep convolutional networks
Core Motion framework, iOS
- URL / Mobile phone sensors
correlation coefficient
- about / Correlation coefficient
cosine distance
- about / Content-based filtering
cost function
- about / Supervised learning
Coursera
- URL / Online courses
cross-industry applications, of affinity analysis
- about / Other applications in various areas
- medical diagnosis / Medical diagnosis
- protein sequences / Protein sequences
- census data / Census data
- customer relationship management (CRM) / Customer relationship management
- IT Operations Analytics / IT Operations Analytics
cross-validation
- about / Cross-validation
Cross Industry Standard Process for Data Mining (CRISP-DM)
- about / CRISP-DM
CrowdANALYTIX
- URL / Competitions
CSVLoader class
- URL / Loading the data
curse of dimensionality
- about / The curse of dimensionality
customer relationship database
- about / Customer relationship database
- challenge / Challenge
- dataset / Dataset
- evaluation / Evaluation

D

data
- about / Data and problem definition
Data and problem definition
- measurement scales / Measurement scales
data and problem definition
- about / Data and problem definition
data cleaning
- about / Data cleaning
data collection
- about / Data collection
- data, observing / Find or observe data
- data, searching / Find or observe data
- data, generating / Generate data
- traps, sampling / Sampling traps
- from mobile phone / Collecting data from a mobile phone
- Android Studio, installing / Installing Android Studio
- data collector, loading / Loading the data collector
- training data, collecting / Collecting training data
data collector
- loading / Loading the data collector
- URL / Loading the data collector
- feature extraction / Feature extraction
Data Mining
- URL / Websites and blogs
Data Mining Research
- URL / Websites and blogs
data pre-processing
- about / Data pre-processing
- data cleaning / Data cleaning
- missing values, filling / Fill missing values
- outliers, removing / Remove outliers
- data transformation / Data transformation
- data reduction / Data reduction
data reduction
- about / Data reduction
data science
- about / Machine learning and data science
Data Science Central
- URL / Websites and blogs
Data Science CS109 (Harvard) by John A. Paulson
- URL / Online courses
data scientist
- about / Machine learning and data science
dataset rebalancing
- about / Dataset rebalancing
datasets
- about / Datasets
data transformation
- about / Data transformation
Decision and Predictive Analytics (ADAPA) / Predictive Model Markup Language
decision trees
- about / Underfitting and overfitting
decision trees learning
- about / Decision tree learning
deep belief network
- building / Building a deep belief network
deep belief networks
- about / Artificial neural networks
Deep Belief Networks (DBNs)
- about / Restricted Boltzmann machine
deep convolutional networks
- about / Deep convolutional networks, MNIST dataset
Deeplearning4j
- about / Deeplearning4j
- URL / Deeplearning4j
- org.deeplearning4j.base / Deeplearning4j
- org.deeplearning4j.berkeley / Deeplearning4j
- org.deeplearning4j.clustering / Deeplearning4j
- org.deeplearning4j.datasets / Deeplearning4j
- org.deeplearning4j.distributions / Deeplearning4j
- org.deeplearning4j.eval / Deeplearning4j
- org.deeplearning4j.exceptions / Deeplearning4j
- org.deeplearning4j.models / Deeplearning4j
- org.deeplearning4j.nn / Deeplearning4j
- org.deeplearning4j.optimize / Deeplearning4j
- org.deeplearning4j.plot / Deeplearning4j
- org.deeplearning4j.rng / Deeplearning4j
- org.deeplearning4j.util / Deeplearning4j
deeplearning4java
- about / Deeplearning4j
- obtaining / Getting DL4J
delta rule
- about / Perceptron
directory
- text data, importing / Importing from directory
Discrete Fourier Transform (DFT)
- about / Activity recognition pipeline
distance measures
- Euclidean distances / Euclidean distances
- non-Euclidean distances / Non-Euclidean distances
divide-and-conquer strategy
- about / FP-growth algorithm
double evaluateLeftToRight method
- Instances heldOutDocuments component / Evaluating a model
- int numParticles component / Evaluating a model
- boolean useResampling component / Evaluating a model
- PrintStream docProbabilityStream component / Evaluating a model
DrivenData / Competitions
DropConnect neural network
- about / MNIST dataset
DSGuide
- URL / Websites and blogs
dynamic time wrapping (DTW)
- about / Java machine learning

E

Eclipse
- Apache Mahout, configuring with Maven plugin / Configuring Mahout in Eclipse with the Maven plugin
Eclipse IDE
- using / Before you start
Edit distance
- about / Non-Euclidean distances
elbow method
- about / Clustering
email spam dataset
- URL / E-mail spam dataset
email spam detection
- about / E-mail spam detection
- email spam dataset, collecting / E-mail spam dataset
- default pipeline, creating / Feature generation
- training / Training and testing
- testing / Training and testing
- model performance, evaluating / Model performance
energy efficiency dataset
- URL / Loading the data
ensambleSel.setOptions () method
- -L </path/to/modelLibrary> option / Model selection
- -W </path/to/working/directory> option / Model selection
- -B <numModelBags> option / Model selection
- -E <modelRatio> option / Model selection
- -V <validationRatio> option / Model selection
- -H <hillClimbIterations> option / Model selection
- -I <sortInitialization> option / Model selection
- -X <numFolds> option / Model selection
- -P <hillclimbMettric> option / Model selection
- -A <algorithm> option / Model selection
- -R option / Model selection
- -G option / Model selection
- -O option / Model selection
- -S <num> option / Model selection
- -D option / Model selection
ensemble learning
- about / Ensemble learning
ensembleLibrary package
- using / Before we start
- URL / Before we start
ensembles
- used, for advanced modelling / Advanced modeling with ensembles
Ensemble Selection algorithm
- about / Advanced modeling with ensembles
environmental sensors
- about / Mobile phone sensors
Euclidean distances
- about / Euclidean distances
evaluate() method, parameters
- RecommenderBuilder / Evaluation
- DataModelBuilder / Evaluation
- DataModel / Evaluation
- trainingPercentage / Evaluation
- evaluationPercentage / Evaluation
evaluation
- about / Generalization and evaluation
Expectation Maximization (EM) clustering
- about / Clustering
exploitation
- about / Exploitation versus exploration
exploration
- about / Exploitation versus exploration

F

Feature extraction
- about / Building a machine learning application
feature map
- about / Deep convolutional networks
feature selection
- about / Data reduction
feedforward neural networks
- about / Feedforward neural networks
file
- text data, importing / Importing from file
Fourier transform
- reference link / Activity recognition pipeline
FP-Growth
- about / Weka
FP-growth algorithm
- about / FP-growth algorithm
- used, for discovering shopping patterns / FP-growth
FP-tree structure
- about / FP-growth algorithm
fraud detection, of insurance claims
- about / Fraud detection of insurance claims
- dataset, using / Dataset
- suspicious patterns, modelling / Modeling suspicious patterns
frequent pattern (FP)
- about / FP-growth algorithm

G

Geeking with Greg
- URL / Websites and blogs
generalization
- about / Generalization and evaluation
- underfitting / Underfitting and overfitting
- overfitting / Underfitting and overfitting
- test set / Train and test sets
- train set / Train and test sets
- cross-validation / Cross-validation
- leave-one-out validation / Leave-one-out validation
- stratification / Stratification
Generalized Sequential Patterns (GSP)
- about / Weka
Generative Stochastic Networks (GSNs)
- about / Restricted Boltzmann machine
Gibbs sampling
- about / Restricted Boltzmann machine
GNU General Public License (GNU GPL)
- about / Weka
Google Prediction API / Machine learning as a service
Graphics Processing Unit (GPU)
- reference link / Build a Multilayer Convolutional Network
- about / Build a Multilayer Convolutional Network
GraphX
- about / Apache Spark

H

Hadoop
- about / Big data application architecture
- URL / Big data application architecture
Hadoop Distributed File System (HDFS)
- about / Apache Spark
Hamming distance
- about / Non-Euclidean distances
HBase
- about / Big data application architecture
- URL / Big data application architecture
Hidden layer
- about / Feedforward neural networks
Hidden layer, issues
- vanishing gradients problem / Feedforward neural networks
- overfitting / Feedforward neural networks
hidden Markov models (HMM)
- about / Apache Mahout
Hidden Markov Models (HMMs)
- about / Transaction analysis
hierarchical clustering
- about / Clustering
histogram-based anomaly detection
- about / Histogram-based anomaly detection
Hotspot
- about / Weka
hybrid approach
- about / Hybrid approach

I

IBM Research team
- about / Advanced modeling with ensembles
IBM Watson Analytics / Machine learning as a service
image classification
- about / Image classification
- deeplearning4java / Deeplearning4j
- MNIST dataset / MNIST dataset
- data, loading / Loading the data
- models, building / Building models
ImageNet
- about / Introducing image recognition
- URL / Deep convolutional networks
image recognition
- about / Introducing image recognition
- neural networks / Neural networks
Infrastructure as a Service (IaaS) / Machine learning in the cloud
Input layer
- about / Feedforward neural networks
insurance claims
- fraud detection / Fraud detection of insurance claims
interval data
- about / Measurement scales
Intrusion Detection (ID)
- about / Transaction analysis
item-based analysis
- about / User-based and item-based analysis
item-based collaborative filtering
- about / Item-based filtering

J

Jaccard distance
- about / Non-Euclidean distances
Java
- need for / The need for Java
Java-ML packages
- net.sf.javaml.classification / Java machine learning
- net.sf.javaml.clustering / Java machine learning
- net.sf.javaml.core / Java machine learning
- net.sf.javaml.distance / Java machine learning
- net.sf.javaml.featureselection / Java machine learning
- net.sf.javaml.filter / Java machine learning
- net.sf.javaml.matrix / Java machine learning
- net.sf.javaml.sampling / Java machine learning
- net.sf.javaml.tools / Java machine learning
- net.sf.javaml.utils / Java machine learning
java -Xmx16g
- about / Performance evaluation
Java API packages, Weka
- weka.associations / Weka
- weka.classifiers / Weka
- weka.clusterers / Weka
- weka.core / Weka
- weka.datagenerators / Weka
- weka.estimators / Weka
- weka.experiment / Weka
- weka.filters / Weka
- weka.gui / Weka
Java machine learning (Java-ML)
- about / Java machine learning
- URL / Java machine learning

K

k-means clustering
- about / Clustering
k-nearest neighbors
- about / Underfitting and overfitting
Kaggle / Competitions
KDD Cup
- URL / Getting the data
KDnuggets / Machine learning as a service
- URL / Websites and blogs
kernel methods
- about / Kernel methods
known-knowns
- about / Unknown-unknowns
known-unknowns
- about / Unknown-unknowns

L

Latent Dirichlet
- about / MALLET
Latent Dirichlet Allocation
- about / Topic modeling
Latent Dirichlet Allocation (LDA)
- about / Modeling
leave-one-out validation
- about / Leave-one-out validation
Linear Discriminant Analysis (LDA)
- about / Modeling
- reference link / Evaluating a model
linear regression
- about / Linear regression
Local Outlier Factor (LOF)
- about / Histogram-based anomaly detection
LOF algorithm
- URL / Density based k-nearest neighbors

M

machine learning
- about / Machine learning and data science
- advantages / What kind of problems can machine learning solve?
- supervised learning / What kind of problems can machine learning solve?
- unsupervised learning / What kind of problems can machine learning solve?
- reinforcement learning / What kind of problems can machine learning solve?
- in real life / Machine learning in real life
- nosiy data / Noisy data
- class unbalance / Class unbalance
- feature selection / Feature selection is hard
- model chaining / Model chaining
- evaluation / Importance of evaluation
- models, in production / Getting models into production
- models, maintaining / Model maintenance
- in cloud / Machine learning in the cloud
- as service / Machine learning as a service
machine learning application
- building / Building a machine learning application
- traditional machine learning / Traditional machine learning architecture
- big data, dealing with / Dealing with big data
Machine Learning for Language Toolkit (MALLET)
- about / MALLET
- URL / MALLET
machine learning libraries
- about / Machine learning libraries
- Waikato Environment for Knowledge Analysis (Weka) / Weka
- Java machine learning (Java-ML) / Java machine learning
- Apache Mahout / Apache Mahout
- Apache Spark / Apache Spark
- Deeplearning4j / Deeplearning4j
- Machine Learning for Language Toolkit (MALLET) / MALLET
- comparing / Comparing libraries
Machine learning mastery
- URL / Websites and blogs
Mahalanobis distance
- about / Non-Euclidean distances, Java machine learning
Mahout interfaces, abstractions
- DataModel / Collaborative filtering
- UserSimilarity / Collaborative filtering
- ItemSimilarity / Collaborative filtering
- UserNeighborhood / Collaborative filtering
- Recommender / Collaborative filtering
Mahout libraries
- org.apache.mahout.cf.taste / Apache Mahout
- org.apache.mahout.classifier / Apache Mahout
- org.apache.mahout.clustering / Apache Mahout
- org.apache.mahout.common / Apache Mahout
- org.apache.mahout.ep / Apache Mahout
- org.apache.mahout.math / Apache Mahout
- org.apache.mahout.vectorizer / Apache Mahout
Mallet
- installing / Installing Mallet
- URL / Installing Mallet
- reference link / Pre-processing text data
MALLET, packages
- cc.mallet.classify / MALLET
- cc.mallet.cluster / MALLET
- cc.mallet.extract / MALLET
- cc.mallet.fst / MALLET
- cc.mallet.grmm / MALLET
- cc.mallet.optimize / MALLET
- cc.mallet.pipe / MALLET
- cc.mallet.topics / MALLET
- cc.mallet.types / MALLET
- cc.mallet.util / MALLET
Manhattan distance
- about / Java machine learning
market basket analysis (MBA)
- about / Market basket analysis
- item affinity / Market basket analysis
- identification, of driver items / Market basket analysis
- trip classification / Market basket analysis
- storetostore comparison / Market basket analysis
- revenue optimization / Market basket analysis
- marketing / Market basket analysis
- operations optimization / Market basket analysis
- affinity analysis / Affinity analysis
Markov chain
- about / Restricted Boltzmann machine
Maven plugin
- Apache Mahout, configuring with / Configuring Mahout in Eclipse with the Maven plugin
mean absolute error
- about / Mean absolute error
mean squared error
- about / Mean squared error
measurement scales
- about / Measurement scales
- nominal data / Measurement scales
- ordinal data / Measurement scales
- interval data / Measurement scales
- ratio data / Measurement scales
Microsoft Azure Machine Learning / Machine learning as a service
Minkowski distance
- about / Java machine learning
missing values
- filling / Fill missing values
MLlib API library
- org.apache.spark.mllib.classification / Apache Spark
- org.apache.spark.mllib.clustering / Apache Spark
- org.apache.spark.mllib.linalg / Apache Spark
- org.apache.spark.mllib.optimization / Apache Spark
- org.apache.spark.mllib.recommendation / Apache Spark
- org.apache.spark.mllib.regression / Apache Spark
- org.apache.spark.mllib.stat / Apache Spark
- org.apache.spark.mllib.tree / Apache Spark
- org.apache.spark.mllib.util / Apache Spark
MNIST dataset
- about / MNIST dataset
mobile app
- classifier, plugging into / Plugging the classifier into a mobile app
mobile phone
- data, collecting / Collecting data from a mobile phone
mobile phone sensors
- about / Mobile phone sensors
- motion sensors / Mobile phone sensors
- environmental sensors / Mobile phone sensors
- position sensors / Mobile phone sensors
- URL, for Android / Mobile phone sensors
- URL, for Windows Phone / Mobile phone sensors
model
- chaining / Model chaining
- in production / Getting models into production
- maintenance / Model maintenance
models
- building / Building models
- single layer regression model, building / Building a single-layer regression model
- deep belief network, building / Building a deep belief network
- Multilayer Convolutional Network, building / Build a Multilayer Convolutional Network
MongoDB
- about / Big data application architecture
- URL / Big data application architecture
motion sensors
- about / Mobile phone sensors
Mozilla Thunderbird
- about / E-mail spam detection
Multilayer Convolutional Network
- about / Building models
- building / Build a Multilayer Convolutional Network
myrunscollector package
- Globals.java class / Loading the data collector
- CollectorActivity.java class / Loading the data collector
- SensorsService.java class / Loading the data collector

N

Naive Bayes
- about / Underfitting and overfitting
naive Bayes baseline
- implementing / Implementing naive Bayes baseline
neural network
- about / Underfitting and overfitting
neural networks
- about / Neural networks
- perceptron / Perceptron
- feedforward neural networks / Feedforward neural networks
- autoencoder / Autoencoder
- Restricted Boltzman machine / Restricted Boltzmann machine
- deep convolutional networks / Deep convolutional networks
nominal data
- about / Measurement scales
non-Euclidean distance
- about / Non-Euclidean distances

O

online courses
- about / Online courses
online learning engine
- about / Online learning engine
Oracle Database Online Documentation
- URL / Dataset
ordinal data
- about / Measurement scales
outliers
- removing / Remove outliers
Output layer
- about / Feedforward neural networks
overfits
- about / Applied machine learning workflow
overfitting
- about / Underfitting and overfitting

P

p-norm distance
- about / Euclidean distances
PAPI
- URL / Machine learning as a service
part-of-speech (POS)
- about / Working with text data
pattern analysis
- about / Pattern analysis
Pearson coefficient
- about / Content-based filtering
Pearson correlation coefficient
- about / Java machine learning
perceptron
- about / Artificial neural networks, Introducing image recognition, Perceptron
plan recognition
- about / Plan recognition
Portable Format for Analytics (PFA) / Predictive Model Markup Language
position sensors
- about / Mobile phone sensors
Pre-processing phase
- about / Building a machine learning application
precision
- about / Precision and recall
Prediction.IO / Machine learning as a service
predictive apriori
- about / Weka
Predictive Model Markup Language (PMML)
- about / Predictive Model Markup Language
Principal component analysis (PCA)
- about / Data reduction
Principal Component Analysis (PCA)
- about / Histogram-based anomaly detection
Principal Components Analysis (PCA)
- about / Kernel methods
probabilistic classifiers
- about / Probabilistic classifiers

R

ratio data
- about / Measurement scales
recall
- about / Precision and recall
Receiver Operating Characteristics (ROC)
- about / Roc curves
recommendation engine
- basic concepts / Basic concepts
- key concepts / Key concepts
- user-based analysis / User-based and item-based analysis
- item-based analysis / User-based and item-based analysis
- similarity, calculating / Approaches to calculate similarity
- exploitation / Exploitation versus exploration
- exploration / Exploitation versus exploration
- book-recommendation engine, building / Building a recommendation engine
regression
- about / Regression, Underfitting and overfitting, Regression
- linear regression / Linear regression
- evaluating / Evaluating regression
- mean squared error / Mean squared error
- mean absolute error / Mean absolute error
- correlation coefficient / Correlation coefficient
- data, loading / Loading the data
- attributes, analyzing / Analyzing attributes
- model, building / Building and evaluating regression model
- model, evaluating / Building and evaluating regression model
- tips / Tips to avoid common regression problems
regression model
- evaluating / Building and evaluating regression model
- building / Building and evaluating regression model
- linear regression / Linear regression
- regression trees / Regression trees
regression trees
- about / Regression trees
reinforcement learning
- about / What kind of problems can machine learning solve?
Resilient Distributed Dataset (RDD)
- about / Apache Spark
Restricted Boltzman machine
- about / Restricted Boltzmann machine
restricted Boltzmann machine
- about / Artificial neural networks
restricted Boltzmann machines (RBM)
- about / Deeplearning4j
Roc curves
- about / Roc curves
RuleSetModel / Predictive Model Markup Language

S

Scale Invariant Feature Transform (SIFT)
- about / Introducing image recognition
score function
- about / Supervised learning
similar items
- searching / Find similar items
similarity calculation
- about / Approaches to calculate similarity
- collaborative filtering / Collaborative filtering
- content-based filtering / Content-based filtering
- hybrid approach / Hybrid approach
SimRank
- about / Non-Euclidean distances
single layer regression model
- building / Building a single-layer regression model
Singular value decomposition (SVD)
- about / Data reduction
Spark Streaming
- about / Apache Spark
spatio-temporal patterns
- about / Transaction analysis
Spearman's footrule distance
- about / Java machine learning
stacked autoencoders
- about / Autoencoder
standards and markup languages
- about / Standards and markup languages
Statistics 110 (Harvard) by Joe Biltzstein
- URL / Online courses
stratification
- about / Stratification
sum transfer function
- about / Perceptron
supermarket dataset
- about / The supermarket dataset
- shopping patterns, discovering / Discover patterns
- shopping patterns, discovering with Apriori algorithm / Apriori
- shopping patterns, discovering with FP-growth algorithm / FP-growth
supervised learning
- about / What kind of problems can machine learning solve?, Supervised learning
- classification / Classification
- regression / Regression
Support Vector Machine (SVM) model / Predictive Model Markup Language
Support Vector Machines (SVM)
- about / Kernel methods
survivorship bias
- about / Sampling traps
suspicious behaviour detection
- about / Suspicious and anomalous behavior detection
suspicious pattern detection
- about / Suspicious pattern detection
suspicious patterns, modelling
- about / Modeling suspicious patterns
- vanilla approach / Vanilla approach
- dataset rebalancing / Dataset rebalancing
SVM
- about / Underfitting and overfitting
Sample, Explore, Modify, Model, and Assess (SEMMA).
- about / SEMMA methodology

T

target variables
- churn probability / Challenge
- appetency probability / Challenge
- upselling probability / Challenge
Tertius
- about / Weka
test set
- about / Train and test sets
text classification
- about / Text classification
- examples / Text classification
text data
- extracting / Working with text data
- importing / Importing data
- importing, from directory / Importing from directory
- importing, from file / Importing from file
- pre-processing / Pre-processing text data
text mining
- about / Introducing text mining
- topic modeling / Topic modeling
- text classification / Text classification
time series data
- anomaly detection / Anomaly detection in time series data
topic modeling
- about / Topic modeling
topic modelling, for BBC news
- about / Topic modeling for BBC news
- BBC dataset, collecting / BBC dataset
- modeling / Modeling
- model, evaluating / Evaluating a model
- model, reusing / Reusing a model
- model, saving / Saving a model
- model, restoring / Restoring a model
traditional machine learning
- architecture / Traditional machine learning architecture
Training data
- about / Building a machine learning application
training data
- collecting / Collecting training data
train set
- about / Train and test sets
transaction analysis
- about / Transaction analysis
TreeModel / Predictive Model Markup Language

U

UCI machine learning repository
- URL / Datasets
Udemy
- URL / Online courses
underfits
- about / Applied machine learning workflow
underfitting
- about / Underfitting and overfitting
Universal PMML Plug-in (UPPI) / Predictive Model Markup Language
unknown-unknowns
- about / Unknown-unknowns
unsupervised learning
- about / What kind of problems can machine learning solve?, Unsupervised learning
- similar items, searching / Find similar items
- clustering / Clustering
user-based analysis
- about / User-based and item-based analysis
user-based collaborative filtering
- about / User-based filtering

V

vanilla approach
- about / Vanilla approach

W

Waikato Environment for Knowledge Analysis (Weka)
- about / Weka
- URL / Weka
web resources and competitions
- about / Web resources and competitions, Competitions
- datasets / Datasets
- online courses / Online courses
- websites and blogs / Websites and blogs
- venues and conferences / Venues and conferences
website traffic
- anomaly detection / Anomaly detection in website traffic
weka.classifiers package
- weka.classifiers.bayes / Weka
- weka.classifiers.evaluation / Weka
- weka.classifiers.functions / Weka
- weka.classifiers.lazy / Weka
- weka.classifiers.meta / Weka
- weka.classifiers.mi / Weka
- weka.classifiers.rules / Weka
- weka.classifiers.trees / Weka
Weka 3.6
- URL / Before you start
- downloading / Before you start
WEKA Packages
- URL / Before we start
word2vec
- about / Working with text data
- URL / Working with text data
workflow, Applied Machine Learning
- data and problem definition / Applied machine learning workflow
- data collection / Applied machine learning workflow
- data preprocessing / Applied machine learning workflow
- data analysis and modeling / Applied machine learning workflow
- evaluation / Applied machine learning workflow

X

Xiaming Chen
- URL / Datasets

Y

Yahoo traffic dataset
- URL / Dataset

Machine Learning in Java

By : Bostjan Kaluza

Machine Learning in Java

By: Bostjan Kaluza

Overview of this book

Related Content you might be interested in

Current Title:

Machine Learning in Java

Index

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

P

R

S

T

U

V

W

X

Y