Book Image

Mastering Java Machine Learning

By : Uday Kamath, Krishna Choppella
Book Image

Mastering Java Machine Learning

By: Uday Kamath, Krishna Choppella

Overview of this book

Java is one of the main languages used by practicing data scientists; much of the Hadoop ecosystem is Java-based, and it is certainly the language that most production systems in Data Science are written in. If you know Java, Mastering Machine Learning with Java is your next step on the path to becoming an advanced practitioner in Data Science. This book aims to introduce you to an array of advanced techniques in machine learning, including classification, clustering, anomaly detection, stream learning, active learning, semi-supervised learning, probabilistic graph modeling, text mining, deep learning, and big data batch and stream machine learning. Accompanying each chapter are illustrative examples and real-world case studies that show how to apply the newly learned techniques using sound methodologies and the best Java-based tools available today. On completing this book, you will have an understanding of the tools and techniques for building powerful machine learning models to solve data science problems in just about any domain.
Table of Contents (20 chapters)
Mastering Java Machine Learning
Credits
Foreword
About the Authors
About the Reviewers
www.PacktPub.com
Customer Feedback
Preface
Linear Algebra
Index

Index

A

  • Abstract-C / Abstract-C
  • active learning
    • about / Active learning
    • representation / Representation and notation
    • notation / Representation and notation
    • scenarios / Active learning scenarios
    • approaches / Active learning approaches
    • uncertainty sampling / Uncertainty sampling
    • version space sampling / Version space sampling
  • active learning, case study
    • about / Case study in active learning
    • tools / Tools and software
    • software / Tools and software
    • business problem / Business problem
    • machine learning, mapping / Machine learning mapping
    • data collection / Data Collection
    • data sampling / Data sampling and transformation
    • data transformation / Data sampling and transformation
    • feature analysis / Feature analysis and dimensionality reduction
    • dimensionality reduction / Feature analysis and dimensionality reduction
    • models / Models, results, and evaluation
    • results / Models, results, and evaluation
    • evaluation / Models, results, and evaluation
    • results, pool-based scenarios / Pool-based scenarios
    • results, stream-based scenarios / Stream-based scenarios
    • results, analysis / Analysis of active learning results
  • ADaptable sliding WINdow (ADWIN)
    • about / Sliding windows
  • adaptation methods
    • about / Adaptation methods
    • explicit adaptation / Explicit adaptation
    • implicit adaptation / Implicit adaptation
  • Advanced Message Queueing Protocol (AMQP) / Message queueing frameworks
  • affinity propagation
    • about / Affinity propagation
    • inputs / Inputs and outputs
    • outputs / Inputs and outputs
    • working / How does it work?
    • advantages / Advantages and limitations
    • limitations / Advantages and limitations
  • algorithms, comparing
    • McNemars Test / McNemar's Test
    • Wilcoxon signed-rank test / Wilcoxon signed-rank test
  • Amazon Elastic MapReduce (EMR) / Amazon Elastic MapReduce
  • Amazon Kinesis / Publish-subscribe frameworks
  • Amazon Redshift / Amazon Redshift
  • Angle-based Outlier Degree (ABOD) / How does it work?
  • anomaly detection
    • about / Outlier or anomaly detection
  • ANOVA test / ANOVA test
  • Apache Kafka / Publish-subscribe frameworks
  • Apache Storm / SAMOA as a real-time Big Data Machine Learning framework
  • Approx Storm / Approx Storm
  • ArangoDB / Graph databases
  • association analysis / Machine learning – types and subtypes
  • Autoencoders
    • about / Autoencoders
    • mathematical notations / Definition and mathematical notations
    • loss function / Loss function
    • limitations / Limitations of Autoencoders
    • denoising / Denoising Autoencoder
  • axioms of probability / Axioms of probability

B

  • Balanced Iterative Reducing and Clustering Hierarch (BIRCH)
    • about / Hierarchical based and micro clustering
    • input / Inputs and outputs
    • output / Inputs and outputs
    • working / How does it work?
    • advantages / Advantages and limitations
    • limitations / Advantages and limitations
  • Batch Big Data Machine Learning
    • about / Batch Big Data Machine Learning
    • used, as H2O / H2O as Big Data Machine Learning platform
  • Bayesian information score (BIC) / Measures to evaluate structures
  • Bayesian networks
    • about / Bayesian networks
    • representation / Bayesian networks, Representation
    • inference / Bayesian networks, Inference
    • learning / Bayesian networks, Learning
  • Bayes theorem
    • about / Bayes' theorem
    • density, estimation / Density estimation
    • mean / Mean
    • variance / Variance
    • standard deviation / Standard deviation
    • Gaussian standard deviation / Gaussian standard deviation
    • covariance / Covariance
    • correlation coefficient / Correlation coefficient
    • binomial distribution / Binomial distribution
    • Poisson distribution / Poisson distribution
    • Gaussian distribution / Gaussian distribution
    • central limit theorem / Central limit theorem
    • error propagation / Error propagation
  • Bernoulli distribution / Random variables, joint, and marginal distributions
  • Big Data
    • characteristics / What are the characteristics of Big Data?
    • volume / What are the characteristics of Big Data?
    • velocity / What are the characteristics of Big Data?
    • variety / What are the characteristics of Big Data?
    • veracity / What are the characteristics of Big Data?
  • Big Data cluster deployment frameworks
    • about / Big Data cluster deployment frameworks
    • Hortonworks Data Platform (HDP) / Hortonworks Data Platform
    • Cloudera CDH / Cloudera CDH
    • Amazon Elastic MapReduce (EMR) / Amazon Elastic MapReduce
    • Microsoft Azure HDInsight / Microsoft Azure HDInsight
  • Big Data framework
    • about / General Big Data framework
    • cluster deployment frameworks / Big Data cluster deployment frameworks
    • data acquisition / Data acquisition
    • data storage / Data storage
    • data preparation / Data processing and preparation
    • data processing / Data processing and preparation
    • machine learning / Machine Learning
    • visualization / Visualization and analysis
    • analysis / Visualization and analysis
  • Big Data Machine Learning
    • about / Big Data Machine Learning
    • framework / General Big Data framework
    • Big Data framework / General Big Data framework
    • Spark MLlib / Spark MLlib as Big Data Machine Learning platform
  • binomial distribution / Binomial distribution
  • boosting
    • about / Boosting
    • algorithm input / Algorithm inputs and outputs
    • algorithm output / Algorithm inputs and outputs
    • working / How does it work?
    • advantages / Advantages and limitations
    • limitation / Advantages and limitations
  • bootstrap aggregating (bagging)
    • about / Bootstrap aggregating or bagging
    • algorithm inputs / Algorithm inputs and outputs
    • algorithm outputs / Algorithm inputs and outputs
    • working / How does it work?
    • Random Forest / Random Forest
    • advantages / Advantages and limitations
    • limitations / Advantages and limitations
  • Broyden-Fletcher-Goldfarb-Shanno (BFGS) / How does it work?
  • Business Intelligence (BI) / What is not machine learning?
  • business problem / Business problem, Business problem

C

  • case study
    • about / Case study
    • business problem / Business problem
    • machine learning mapping / Machine learning mapping
    • data sampling and transformation / Data sampling and transformation
    • feature analysis / Feature analysis
    • Models, results, and evaluation / Models, results, and evaluation
    • results, analysis / Analysis of results
  • case study, with CoverType dataset
    • about / Case study
    • business problem / Business problem
    • machine learning, mapping / Machine Learning mapping
    • data collection / Data collection
    • data sampling / Data sampling and transformation
    • data transformation / Data sampling and transformation
    • Big Data Machine Learning, used as Spark MLlib / Spark MLlib as Big Data Machine Learning platform
  • Cassandra / Columnar databases
  • central limit theorem / Central limit theorem
  • Chi-Squared feature / Statistical approach
  • Chunk / H2O architecture
  • classification
    • about / Formal description and notation
  • Classification and Regression Trees (CART) / Decision Trees
  • Clique tree or junction tree algorithm, Bayesian networks
    • about / Clique tree or junction tree algorithm
    • input and output / Input and output
    • working / How does it work?
    • advantages and limitations / Advantages and limitations
  • Cloudera CDH / Cloudera CDH
  • Cluster-based Local Outlier Factor (CBLOF)
    • about / How does it work?
  • clustering
    • about / Clustering
    • spectral clustering / Spectral clustering
    • affinity propagation / Affinity propagation
    • using, for incremental unsupervised learning / Incremental unsupervised learning using clustering
    • evaluation techniques / Validation and evaluation techniques
    • validation techniques / Validation and evaluation techniques
    • stream cluster evaluation, key issues / Key issues in stream cluster evaluation
    • evaluation measures / Evaluation measures
  • clustering-based methods
    • about / Clustering-based methods
    • inputs / Inputs and outputs
    • outputs / Inputs and outputs
    • working / How does it work?
    • advantages / Advantages and limitations
    • limitations / Advantages and limitations
  • clustering algorithms
    • about / Clustering algorithms
    • k-means / k-Means
    • DBSCAN / DBSCAN
    • mean shift / Mean shift
    • Gaussian mixture modeling (GMM) / Expectation maximization (EM) or Gaussian mixture modeling (GMM)
    • expectation maximization (EM) / Expectation maximization (EM) or Gaussian mixture modeling (GMM)
    • hierarchical clustering / Hierarchical clustering
    • self-organizing maps (SOM) / Self-organizing maps (SOM)
  • clustering evaluation
    • about / Clustering validation and evaluation
    • internal evaluation measures / Clustering validation and evaluation, Internal evaluation measures
    • external evaluation measures / Clustering validation and evaluation, External evaluation measures
  • Clustering Features (CF) / Hierarchical based and micro clustering
  • Clustering Feature Tree (CF Tree) / Hierarchical based and micro clustering
  • clustering techniques
    • about / Clustering techniques
    • generative probabilistic models / Generative probabilistic models
    • distance-based text clustering / Distance-based text clustering
    • non-negative Matrix factorization (NMF) / Non-negative matrix factorization (NMF)
  • Clustering Trees (CT) / Hierarchical based and micro clustering
  • clustering validation
    • about / Clustering validation and evaluation
  • Cluster Mapping Measures (CMM)
    • about / Cluster Mapping Measures (CMM)
    • mapping phase / Cluster Mapping Measures (CMM)
    • penality phase / Cluster Mapping Measures (CMM)
  • cluster mode / Amazon Elastic MapReduce
  • cluster SSL
    • about / Cluster and label SSL
    • input / Inputs and outputs
    • output / Inputs and outputs
    • working / How does it work?
    • limitations / Advantages and limitations
    • advantages / Advantages and limitations
  • CluStream
    • input / Inputs and outputs
    • output / Inputs and outputs
    • working / How does it work?
    • advantages / Advantages and limitations
    • limitations / Advantages and limitations
  • co-training SSL
    • about / Co-training SSL or multi-view SSL
    • inputs / Inputs and outputs
    • output / Inputs and outputs
    • working / How does it work?
    • advantages / Advantages and limitations
    • limitations / Advantages and limitations
  • columnar databases / Columnar databases
  • concept drift
    • about / Concept drift and drift detection
  • conditional probability distribution (CPD) / Factor types, Definition
  • Conditional random fields (CRFs) / Conditional random fields
  • confusion matrix / Confusion matrix and related metrics
  • connectivity-based outliers (COF) / How does it work?
  • contrastive divergence (CD) / Contrastive divergence
  • Convolutional Neural Networks (CNN)
    • about / Convolutional Neural Network
    • local connectivity / Local connectivity
    • parameter sharing / Parameter sharing
    • discrete convolution / Discrete convolution
    • Pooling or Subsampling / Pooling or subsampling
    • ReLU / Normalization using ReLU
  • coreference resolution / Coreference resolution
  • correlation-based feature selection (CFS) / Correlation-based feature selection (CFS)
  • Correlation based Feature selection (CFS)
    • about / Feature selection
  • correlation coefficient / Correlation coefficient
  • Cosine distance / Cosine distance
  • covariance / Covariance
  • CoverType dataset
    • reference link / Business problem
  • Cross Industry Standard Process (CRISP)
    • about / Process
  • cumulative sum (CUSUM) / CUSUM and Page-Hinckley test
  • custom frameworks / Custom frameworks

D

  • (DBSCAN)
    • about / DBSCAN
    • inputs / Inputs and outputs
    • outputs / Inputs and outputs
    • working / How does it work?
    • advantages / Advantages and limitations
    • limitations / Advantages and limitations
  • D-Separation, Bayesian networks / D-Separation
  • data acquisition
    • about / Data acquisition
    • publish-subscribe frameworks / Publish-subscribe frameworks
    • source-sink frameworks / Source-sink frameworks
    • SQL frameworks / SQL frameworks
    • message queueing frameworks / Message queueing frameworks
    • custom frameworks / Custom frameworks
  • data analysis
    • about / Data analysis
    • label analysis / Label analysis
    • features analysis / Features analysis
  • data collection
    • mapping / Data collection
  • data distribution sampling
    • about / Data distribution sampling
    • working / How does it work?
    • model change / Expected model change
    • error reduction / Expected error reduction
    • advantages / Advantages and limitations
    • disadvantages / Advantages and limitations
  • Data Frame / H2O architecture
  • data management
    • about / Data management
  • data preparation
    • key tasks / Data processing and preparation
    • HQL / Hive and HQL
    • Hive / Hive and HQL
    • Spark SQL / Spark SQL
    • Amazon Redshift / Amazon Redshift
    • real-time stream processing / Real-time stream processing
  • data preprocessing
    • about / Data transformation and preprocessing
  • data processing
    • key tasks / Data processing and preparation
    • HQL / Hive and HQL
    • Hive / Hive and HQL
    • Spark SQL / Spark SQL
    • Amazon Redshift / Amazon Redshift
    • real-time stream processing / Real-time stream processing
  • data quality analysis
    • about / Data quality analysis
    / Data quality analysis
  • data sampling
    • about / Data sampling, Data sampling and transformation
    • need for / Is sampling needed?
    • undersampling / Undersampling and oversampling
    • oversampling / Undersampling and oversampling
    • stratified sampling / Stratified sampling
    • techniques / Training, validation, and test set
    • experiments / Experiments, results, and analysis
    • results / Experiments, results, and analysis
    • analysis / Experiments, results, and analysis, Feature relevance and analysis
    • feature relevance / Feature relevance and analysis
    • test data, evaluation / Evaluation on test data
    • results, analysis / Analysis of results
  • datasets
    • used, in machine learning / Datasets used in machine learning
    • structured data / Datasets used in machine learning
    • transaction data / Datasets used in machine learning
    • market data / Datasets used in machine learning
    • unstructured data / Datasets used in machine learning
    • sequential data / Datasets used in machine learning
    • graph data / Datasets used in machine learning
  • datasets, machine learning
    • about / Datasets
    • UC Irvine (UCI) database / Datasets
    • Tunedit / Datasets
    • Mldata.org / Datasets
    • KDD Challenge Datasets / Datasets
    • Kaggle / Datasets
  • data storage
    • about / Data storage
    • HDFS / HDFS
    • NoSQL / NoSQL
  • data transformation
    • about / Data transformation and preprocessing, Data sampling and transformation
    • feature, construction / Feature construction
    • missing values, handling / Handling missing values
    • outliers, handling / Outliers
    • discretization / Discretization
    • data sampling / Data sampling
    • training / Training, validation, and test set
    • validation / Training, validation, and test set
    • test set / Training, validation, and test set
  • Davies-Bouldin index / Davies-Bouldin index
    • Silhouettes index / Silhouette's index
  • Decision Trees
    • about / Decision Trees
    • algorithm input / Algorithm inputs and outputs
    • algorithm output / Algorithm inputs and outputs
    • working / How does it work?
    • advantages / Advantages and limitations
    • limitations / Advantages and limitations
  • Deep Autoencoders
    • about / Deep Autoencoders
  • Deep Belief Networks (DBN)
    • inputs and outputs / Inputs and outputs
    • working / How does it work?
  • Deep feed-forward NN
    • about / Deep feed-forward NN
    • input and outputs / Input and outputs
    • working / How does it work?
  • deep learning
    • about / Deep learning
    • building blocks / Building blocks for deep learning
    • Rectified linear activation function / Rectified linear activation function
    • Restricted Boltzmann Machines / Restricted Boltzmann Machines
    • Autoencoders / Autoencoders
    • Unsupervised pre-training and supervised fine-tuning / Unsupervised pre-training and supervised fine-tuning
    • Deep feed-forward NN / Deep feed-forward NN , How does it work?
    • Deep Autoencoders / Deep Autoencoders
    • Deep Belief Networks (DBN) / Deep Belief Networks, Inputs and outputs
    • Dropouts / Deep learning with dropouts, Definition and mathematical notation
    • sparse coding / Sparse coding
    • Convolutional Neural Network (CNN) / Convolutional Neural Network
    • Convolutional Neural Network (CNN) layers / CNN Layers
    • Recurrent Neural Networks (RNN) / Recurrent Neural Networks
  • Deep Learning
    • about / Deep learning and NLP
  • Deep Learning (DL) / Feature relevance and analysis
  • deep learning, case study
    • about / Case study
    • tools and software / Tools and software
    • business problem / Business problem
    • machine learning mapping / Machine learning mapping
    • feature analysis / Feature analysis
    • models, results and evaluation / Models, results, and evaluation
    • basic data handling / Basic data handling
    • multi-layer Perceptron / Multi-layer perceptron
    • MLP, parameters / Multi-layer perceptron
    • MLP, code for / Code for MLP
    • Convolutional Network / Convolutional Network
    • Convolutional Network, code for / Code for CNN
    • Variational Autoencoder / Variational Autoencoder, Code for Variational deep learning, case studyVariational AutoencoderAutoencoder
    • DBN / DBN
    • parameter search, Arbiter used / Parameter search using Arbiter
    • results and analysis / Results and analysis
  • DeepLearning4J
    • URL / Machine learning – tools and datasets
    • about / Machine learning – tools and datasets
  • denisity
    • estimation / Density estimation
  • density-based methods / Outliers
    • about / Density-based methods
    • inputs / Inputs and outputs
    • outputs / Inputs and outputs
    • working / How does it work?
    • advantages / Advantages and limitations
    • limitations / Advantages and limitations
  • density based algorithm
    • about / Density based
    • input / Inputs and outputs
    • output / Inputs and outputs
    • working / How does it work?
    • advantages / Advantages and limitations
    • limitations / Advantages and limitations
  • desccriptive quality analysis
    • about / Descriptive data analysis
    • basic label analysis / Basic label analysis
    • basic feature analysis / Basic feature analysis
  • detection methods
    • model evolution, monitoring / Monitoring model evolution
    • distribution changes, monitoring / Monitoring distribution changes
  • Deviance-Threshold Measure / Measures to evaluate structures
  • Dice coefficient / Dice coefficient
  • dimensionality reduction
    • about / Feature relevance analysis and dimensionality reduction, Feature analysis and dimensionality reduction
    • notation / Notation
    • linear models / Linear methods
    • nonlinear methods / Nonlinear methods
    • PCA / PCA
    • random projections / Random projections
    • ISOMAP / ISOMAP
    • observation / Observations on feature analysis and dimensionality reduction
    / Dimensionality reduction
  • Directed Acyclic Graph (DAG) / Definition
  • Direct Update of Events (DUE) / Direct Update of Events (DUE)
  • Dirichlet distribution / Prior and posterior using the Dirichlet distribution
  • discretization
    • about / Discretization
    • by binning / Discretization
    • by frequency / Discretization
    • by entropy / Discretization
  • distance-based clustering
    • for outlier detection / Distance-based clustering for outlier detection
  • distance-based methods / Outliers
    • about / Distance-based methods
    • inputs / Inputs and outputs
    • outputs / Inputs and outputs
    • working / How does it work?
    • advantages / Advantages and limitations
    • limitations / Advantages and limitations
  • distribution changes, monitoring
    • about / Monitoring distribution changes
    • Welchs test / Welch's t test
    • Kolmogorov-Smirnovs test / Kolmogorov-Smirnov's test
    • Page-Hinckley test / CUSUM and Page-Hinckley test
    • cumulative sum (CUSUM) / CUSUM and Page-Hinckley test
  • document collection
    • about / Document collection and standardization
    • inputs / Inputs and outputs
    • outputs / Inputs and outputs
    • working / How does it work?
  • document databases / Document databases
  • document frequency (DF) / Frequency-based techniques
  • drift detection
    • about / Concept drift and drift detection
    • data management / Data management
    • partial memory / Partial memory
  • drift detection method (DDM) / Drift Detection Method or DDM
  • Dropouts
    • about / Deep learning with dropouts
    • definition and mathematical notation / Definition and mathematical notation, How does it work?
    • training with / Learning Training and testing with dropouts
    • testing with / Learning Training and testing with dropouts
  • Dunns Indices / Dunn's Indices

E

  • early drift detection method (EDDM) / Early Drift Detection Method or EDDM
  • eigendecomposition / Eigendecomposition
  • ELEC dataset / Data collection
  • elimination-based inference, Bayesian networks
    • about / Elimination-based inference
    • variable elimination algorithm / Variable elimination algorithm
    • input and output / Input and output
    • VE algorithm, advantages / Advantages and limitations
  • Elki
    • about / Machine learning – tools and datasets
    • URL / Machine learning – tools and datasets
  • EM (Diagonal Gaussian Model Factory) / Clustering models, results, and evaluation
  • embedded approach / Embedded approach
  • ensemble algorithms
    • about / Ensemble algorithms
    • weighted majority algorithm (WMA) / Weighted majority algorithm
    • online bagging algorithm / Online Bagging algorithm
    • online boosting algorithm / Online Boosting algorithm
  • ensemble learning
    • about / Ensemble learning and meta learners
    • types / Ensemble learning and meta learners
    • bootstrap aggregating (bagging) / Bootstrap aggregating or bagging
    • boosting / Boosting
  • error propagation / Error propagation
  • error reduction
    • variance reduction / Variance reduction
    • density weighted methods / Density weighted methods
  • Euclidean distance / Euclidean distance
  • EUPLv1.1
    • URL / Machine learning – tools and datasets
  • evaluation criteria
    • accuracy / Evaluation criteria
    • balanced accuracy / Evaluation criteria
    • Area under ROC curve (AUC) / Evaluation criteria
    • Kappa statistic (K) / Evaluation criteria
    • Kappa Plus statistic / Evaluation criteria
  • evaluation measures, clustering
    • Cluster Mapping Measures (CMM) / Cluster Mapping Measures (CMM)
    • V-Measure / V-Measure
    • other measures / Other external measures
    • purity / Other external measures
    • entropy / Other external measures
    • Recall / Other external measures
    • F-Measure / Other external measures
    • Precision / Other external measures
  • Exact Storm / Exact Storm
  • Expectation Maximization (EM) / Advantages and limitations
    • about / Expectation maximization (EM) or Gaussian mixture modeling (GMM), How does it work?, How does it work?
  • extended Jaccard Coefficient / Extended Jaccard coefficient
  • external evaluation measures
    • about / External evaluation measures
    • Rand index / Rand index
    • F-Measure / F-Measure
    • normalized mutual information index (NMI) / Normalized mutual information index

F

  • F-Measure / F-Measure
  • False Positive Rate (FPR) / Confusion matrix and related metrics
  • feature analysis
    • about / Feature analysis and dimensionality reduction
    • notation / Notation
    • observation / Observations on feature analysis and dimensionality reduction
  • feature evaluation techniques
    • about / Feature evaluation techniques
    • filter approach / Filter approach
    • wrapper approach / Wrapper approach
    • embedded approach / Embedded approach
  • feature extraction/generation
    • about / Feature extraction/generation
    • lexical features / Lexical features
    • syntactic features / Syntactic features
    • semantic features / Semantic features
  • feature relevance analysis
    • about / Feature relevance analysis and dimensionality reduction
    • feature search techniques / Feature search techniques
    • feature evaluation techniques / Feature evaluation techniques
  • features
    • construction / Feature construction
  • feature search techniques / Feature search techniques
  • feature selection
    • about / Feature selection
    • Information theoretic techniques / Information theoretic techniques
    • statistical-based techniques / Statistical-based techniques
    • frequency-based techniques / Frequency-based techniques
  • filter approach
    • about / Filter approach
    • univariate feature selection / Univariate feature selection
    • multivariate feature selection / Multivariate feature selection
  • Fine Needle Aspirate (FNA) / Datasets and analysis
  • flow of influence, Bayesian networks / Flow of influence
  • Friedmans test / Friedman's test

G

  • Gain charts / Gain charts and lift curves
  • Gain Ratio (GR) / Information theoretic techniques
  • Gaussian distribution / Gaussian distribution
  • Gaussian Mixture Model (GMM) / Gaussian Mixture Model
  • Gaussian mixture modeling (GMM)
    • about / Expectation maximization (EM) or Gaussian mixture modeling (GMM), How does it work?
    • input / Input and output
    • output / Input and output
    • working / How does it work?
    • advantages / Advantages and limitations
    • limitations / Advantages and limitations
  • Gaussian Radial Basis Kernel / How does it work?
  • Gaussian standard deviation / Gaussian standard deviation
  • Generalized Linear Models (GLM) / Feature relevance and analysis
  • generative probabilistic models
    • about / Generative probabilistic models
    • input / Input and output
    • output / Input and output
    • working / How does it work?
    • advantages / Advantages and limitations
    • limitation / Advantages and limitations
  • Gibbs parameterization / Gibbs parameterization
  • Gini index / How does it work?
  • Gradient Boosting Machine (GBM) / Feature relevance and analysis
  • graph
    • concepts / Graph concepts
    • structure and properties / Graph structure and properties
    • subgraphs and cliques / Subgraphs and cliques
    • path / Path, trail, and cycles
    • trail / Path, trail, and cycles
    • cycles / Path, trail, and cycles
  • graph data / Datasets used in machine learning
  • graph databases / Graph databases
  • graph mining / Machine learning – types and subtypes
  • GraphX
    • about / Machine learning – tools and datasets
    • URL / Machine learning – tools and datasets
  • grid based algorithm
    • about / Grid based
    • input / Inputs and outputs
    • output / Inputs and outputs
    • working / How does it work?
    • advantages / Advantages and limitations
    • limitations / Advantages and limitations

H

  • H2O
    • about / Machine learning – tools and datasets
    • URL / Machine learning – tools and datasets
    • as Big Data Machine Learning platform / H2O as Big Data Machine Learning platform
    • architecture / H2O architecture
    • machine learning / Machine learning in H2O
    • tools / Tools and usage
    • usage / Tools and usage
  • HBase / Columnar databases
  • HDFS
    • about / HDFS
  • HDFS, components / HDFS
    • NameNode / HDFS
    • Secondary NameNode / HDFS
    • DataNode / HDFS
  • Hidden Markov models
    • about / Hidden Markov models for NER
    • input / Input and output
    • output / Input and output
    • working / How does it work?
    • advantages / Advantages and limitations
    • limitation / Advantages and limitations
  • Hidden Markov models (HMM) / Hidden Markov models
  • hidden Markov models (HMM) / How does it work?
  • hierarchical clustering
    • about / Hierarchical clustering
    • input / Input and output
    • output / Input and output
    • working / How does it work?
    • single linkage / How does it work?
    • complete linkage / How does it work?
    • average linkage / How does it work?
    • advantages / Advantages and limitations
    • limitations / Advantages and limitations
  • high-dimensional-based methods
    • about / High-dimensional-based methods
    • inputs / Inputs and outputs
    • ouputs / Inputs and outputs
    • working / How does it work?
    • advantages / Advantages and limitations
    • limitations / Advantages and limitations
  • Hive / Hive and HQL
  • Hoeffding Trees (HT) / Hoeffding trees or very fast decision trees (VFDT)
    • input / Inputs and outputs
    • output / Inputs and outputs
    • limitations / Advantages and limitations
    • advantages / Advantages and limitations
  • Horse Colic Classification, case study
    • about / Case Study – Horse Colic Classification
    • reference link / Case Study – Horse Colic Classification
    • business problem / Business problem
    • machine learning, mapping / Machine learning mapping
    • data analysis / Data analysis
    • supervised learning, experiments / Supervised learning experiments
    • analysis / Results, observations, and analysis
  • Hortonworks Data Platform (HDP) / Hortonworks Data Platform
  • HQL / Hive and HQL
  • Hyperbolic tangent ( / Hyperbolic tangent ("tanh") function
  • hyperplane / How does it work?

I

  • I-Map, Bayesian networks / I-Map
  • incremental learning / Machine learning – types and subtypes
  • incremental supervised learning
    • about / Incremental supervised learning
    • modeling techniques / Modeling techniques
    • validation / Validation, evaluation, and comparisons in online setting
    • evaluation / Validation, evaluation, and comparisons in online setting
    • comparisons, in online setting / Validation, evaluation, and comparisons in online setting
    • model validation techniques / Model validation techniques
  • incremental unsupervised learning
    • clustering, using / Incremental unsupervised learning using clustering
    • modeling techniques / Modeling techniques
  • independent, identical distributions (i.i.d.) / Monitoring model evolution
  • Independent Component Analysis (ICA) / Advantages and limitations
  • inference, Bayesian networks
    • about / Inference
    • elimination-based inference / Elimination-based inference
    • propagation-based techniques / Propagation-based techniques
    • sampling-based techniques / Sampling-based techniques
  • inferencing / Machine learning – types and subtypes, Semantic reasoning and inferencing
  • influence space (IS) / How does it work?
  • information extraction / Information extraction and named entity recognition
  • Information gain (IG) / Information theoretic techniques
  • internal evaluation measures
    • about / Internal evaluation measures
    • compactness / Internal evaluation measures
    • separation / Internal evaluation measures
    • notation / Notation
    • R-Squared / R-Squared
    • Dunns Indices / Dunn's Indices
    • Davies-Bouldin index / Davies-Bouldin index
  • Internet of things (IoT) / Machine learning applications
  • Interquartile Ranges (IQR) / Outliers
  • inverse document frequency (IDF) / Inverse document frequency (IDF)
  • Isomap / Advantages and limitations
  • iterative reweighted least squares (IRLS) / How does it work?

J

  • Java Class Library for Active Learning (JCLAL)
    • URL / Machine learning – tools and datasets
    • about / Machine learning – tools and datasets
    • reference link / Tools and software
  • JavaScript Object Notation (JSON) / Document collection and standardization
  • JKernelMachines (Transductive SVM) / Tools and software
  • joint distribution / Random variables, joint, and marginal distributions, Factor types

K

  • k-means
    • about / k-Means
    • inputs / Inputs and outputs
    • outputs / Inputs and outputs
    • working / How does it work?
    • advantages / Advantages and limitations
    • limitations / Advantages and limitations
  • k-Nearest Neighbors (k-NN) / Outliers
  • K-Nearest Neighbors (KNN)
    • about / K-Nearest Neighbors (KNN)
    • algorithm input / Algorithm inputs and outputs
    • algorithm output / Algorithm inputs and outputs
    • working / How does it work?
    • advantages / Advantages and limitations
    • limitations / Advantages and limitations
  • Kaggle
    • about / Datasets
    • URL / Datasets
  • KDD Challenge Datasets
    • URL / Datasets
    • about / Datasets
  • KEEL
    • about / Machine learning – tools and datasets
    • URL / Machine learning – tools and datasets
  • KEEL (Knowledge Extraction based on Evolutionary Learning)
    • about / Tools and software
    • reference link / Tools and software
  • kernel density estimation (KDE) / How does it work?
  • Kernel Principal Component Analysis (KPCA)
    • about / Kernel Principal Component Analysis (KPCA)
    • inputs / Inputs and outputs
    • outputs / Inputs and outputs
    • working / How does it work?
    • advantages / Advantages and limitations
    • limitations / Advantages and limitations
  • kernel trick / How does it work?
  • key-value databases / Key-value databases
  • Key Performance Indicators (KPIs) / What is not machine learning?
  • KNIME
    • about / KNIME
    • references / KNIME
  • Knime
    • about / Machine learning – tools and datasets
    • URL / Machine learning – tools and datasets
  • Kohonen networks / How does it work?
  • Kolmogorov-Smirnovs test / Kolmogorov-Smirnov's test
  • Kubat / Widmer and Kubat
  • Kullback-Leibler (KL) / How does it work?

L

  • label SSL
    • about / Cluster and label SSL
    • output / Inputs and outputs
    • input / Inputs and outputs
    • working / How does it work?
    • advantages / Advantages and limitations
    • limitations / Advantages and limitations
  • Latent Dirichlet Allocation (LDA) / Advantages and limitations
  • latent semantic analysis (LSA) / Dimensionality reduction
  • learning
    • techniques / Training, validation, and test set
  • learning, Bayesian networks
    • goals / Learning
    • parameters / Learning parameters
    • Maximum likelihood estimation (MLE) / Maximum likelihood estimation for Bayesian networks
    • Bayesian parameter, estimation / Bayesian parameter estimation for Bayesian network
    • Dirichlet distribution / Prior and posterior using the Dirichlet distribution
    • structures / Learning structures
    • structures, evaluating / Measures to evaluate structures
    • structures, learning / Methods for learning structures, Advantages and limitations
    • constraint-based techniques / Constraint-based techniques
    • advantages and limitations / Advantages and limitations
    • search and score-based techniques / Search and score-based techniques, How does it work?, Advantages and limitations
  • lemmatization
    • about / Stemming or lemmatization
    • input / Inputs and outputs
    • output / Inputs and outputs
    • working / How does it work?
  • Leveraging Bagging (LB) / Supervised learning experiments
  • lexical features
    • about / Lexical features
    • character-based features / Character-based features
    • word-based features / Word-based features
    • part-of-speech tagging features / Part-of-speech tagging features
    • taxonomy features / Taxonomy features
  • lift curves / Gain charts and lift curves
  • linear algorithm
    • online linear models, with loss functions / Online linear models with loss functions
    • Online Naive Bayes / Online Naïve Bayes
  • Linear Embedding (LLE) / How does it work?
  • linear models / Linear models
    • Linear Regression / Linear Regression
    • Naive Bayes / Naïve Bayes
    • Logistic Regression / Logistic Regression
    • about / Linear methods
    • principal component analysis (PCA) / Principal component analysis (PCA)
    • random projections (RP) / Random projections (RP)
    • Multidimensional Scaling (MDS) / Multidimensional Scaling (MDS)
  • Linear Regression
    • about / Linear Regression
    • algorithm input / Algorithm input and output
    • algorithm output / Algorithm input and output
    • working / How does it work?
    • advantages / Advantages and limitations
    • limitations / Advantages and limitations
  • Lloyds algorithm
    • working / How does it work?
  • Local Outlier Factor (LOF) / How does it work?
  • logical datasets
    • about / Training, validation, and test set
  • Logistic Regression
    • about / Logistic Regression
    • algorithm input / Algorithm input and output
    • algorithm output / Algorithm input and output
    • working / How does it work?
    • advantages / Advantages and limitations
    • limitation / Advantages and limitations

M

  • machine learning / Machine Learning
    • history / Machine learning – history and definition
    • definition / Machine learning – history and definition
    • relationship with / Machine learning – history and definition
    • concepts / Machine learning – concepts and terminology
    • terminology / Machine learning – concepts and terminology
    • types / Machine learning – types and subtypes
    • subtypes / Machine learning – types and subtypes
    • supervised learning / Machine learning – types and subtypes
    • semi-supervised learning / Machine learning – types and subtypes
    • graph mining / Machine learning – types and subtypes
    • probabilistic graph modeling / Machine learning – types and subtypes
    • inferencing / Machine learning – types and subtypes
    • time-series forecasting / Machine learning – types and subtypes
    • association analysis / Machine learning – types and subtypes
    • reinforcement learning / Machine learning – types and subtypes
    • stream learning / Machine learning – types and subtypes
    • incremental learning / Machine learning – types and subtypes
    • datasets, used / Datasets used in machine learning
    • practical issues / Practical issues in machine learning
    • roles / Machine learning – roles and process
    • process / Machine learning – roles and process
    • tools / Machine learning – tools and datasets
    • datasets / Machine learning – tools and datasets
    • mapping / Machine learning mapping, Machine learning mapping
    • in H2O / Machine learning in H2O
    • future / The future of Machine Learning
  • machine learning applications / Machine learning applications
  • machine translation (MT) / Machine translation
  • mallet
    • topic modeling / Topic modeling with mallet
  • Mallet
    • URL / Machine learning – tools and datasets
    • about / Machine learning – tools and datasets
    / Mallet
  • manifold learning
    • about / Manifold learning
    • input / Inputs and outputs
    • output / Inputs and outputs
    • working / How does it work?
    • advantages / Advantages and limitations
    • limitations / Advantages and limitations
  • marginal distribution / Random variables, joint, and marginal distributions
  • market data / Datasets used in machine learning
  • Markov blanket / Markov blanket
  • Markov chains
    • about / Markov chains
    • Hidden Markov models (HMM) / Hidden Markov models
    • Hidden Markov models (HMM), portable path / Most probable path in HMM
    • Hidden Markov models (HMM), posterior decoding / Posterior decoding in HMM
  • Markov networks (MN) / Markov networks and conditional random fields
    • representation / Representation
    • parameterization / Parameterization
    • Gibbs parameterization / Gibbs parameterization
    • factor graphs / Factor graphs
    • log-linear models / Log-linear models
    • independencies / Independencies
    • global / Global
    • Pairwise Markov / Pairwise Markov
    • Markov blanket / Markov blanket
    • inference / Inference
    • learning / Learning
    • Conditional random fields (CRFs) / Conditional random fields
  • Markov random field (MRF) / Markov networks and conditional random fields
  • massively parallel processing (MPP) / Amazon Redshift
  • Massive Online Analysis (MOA)
    • about / Tools and software
    • references / Tools and software
    • reference link / Analysis of stream learning results
  • mathematical transformation
    • of feature / Outliers
  • matrix
    • about / Matrix
    • transpose / Transpose of a matrix
    • addition / Matrix addition
    • scalar multiplication / Scalar multiplication
    • multiplication / Matrix multiplication
  • matrix product, properties
    • about / Properties of matrix product
    • linear transformation / Linear transformation
    • matrix inverse / Matrix inverse
    • eigendecomposition / Eigendecomposition
    • positive definite matrix / Positive definite matrix
  • maximum entropy Markov model (MEMM)
    • about / Maximum entropy Markov models for NER
    • input / Input and output
    • output / Input and output
    • working / How does it work?
    • advantages / Advantages and limitations
    • limitation / Advantages and limitations
  • Maximum Likelihood Estimates (MLE) / How does it work?
  • Maximum likelihood estimation (MLE) / Maximum likelihood estimation for Bayesian networks
  • McNemars Test / McNemar's Test
  • McNemar test / Comparing algorithms and metrics
  • mean / Mean
  • mean shift
    • about / Mean shift
    • inputs / Inputs and outputs
    • outputs / Inputs and outputs
    • working / How does it work?
    • advantages / Advantages and limitations
    • limitations / Advantages and limitations
  • meta learners
    • about / Ensemble learning and meta learners
  • Micro Clustering based Algorithm (MCOD) / Micro Clustering based Algorithm (MCOD)
  • Microsoft Azure HDInsight / Microsoft Azure HDInsight
  • Min-Max Normalization / Outliers
  • minimal redundancy maximal relevance (mRMR) / Minimal redundancy maximal relevance (mRMR)
  • Minimum Covariant Determinant (MCD) / How does it work?
  • Minimum Description Length (MDL) / How does it work?
  • missing values
    • handling / Handling missing values, Results, observations, and analysis
  • Mixed National Institute of Standards and Technology (MNIST) / Data quality analysis
  • Mldata.org
    • about / Datasets
    • URL / Datasets
  • MNIST database
    • reference link / Data collection
  • MOA
    • about / Machine learning – tools and datasets
  • model
    • building / Model building
    • linear models / Linear models
  • model assesment
    • about / Model assessment, evaluation, and comparisons, Model assessment
  • model comparison
    • about / Model assessment, evaluation, and comparisons, Model comparisons
    • algorithms, comparing / Comparing two algorithms
    • multiple algorithms, comparing / Comparing multiple algorithms
  • model evaluation
    • about / Model assessment, evaluation, and comparisons
  • model evaluation metrics
    • about / Model evaluation metrics, Model evaluation metrics
    • confusion matrix / Confusion matrix and related metrics
    • PRC curve / ROC and PRC curves
    • ROC curve / ROC and PRC curves
    • Gain charts / Gain charts and lift curves
    • lift curves / Gain charts and lift curves
    • Confusion Metrics, evaluation / Evaluation on Confusion Metrics
    • ROC curves / ROC Curves, Lift Curves, and Gain Charts
    • Lift Curves / ROC Curves, Lift Curves, and Gain Charts
    • Gain Charts / ROC Curves, Lift Curves, and Gain Charts
  • model evolution, monitoring
    • Kubat / Widmer and Kubat
    • drift detection method (DDM) / Drift Detection Method or DDM
    • early drift detection method (EDDM) / Early Drift Detection Method or EDDM
  • model evolution, monitoring
    • about / Monitoring model evolution
    • Widmer / Widmer and Kubat
    • early drift detection method (EEDM) / Early Drift Detection Method or EDDM
  • modeling techniques
    • linear algorithm / Linear algorithms
    • non-linear algorithms / Non-linear algorithms
    • ensemble algorithms / Ensemble algorithms
    • partition based algorithm / Partition based
    • hierarchical clustering / Hierarchical based and micro clustering
    • micro clustering / Hierarchical based and micro clustering
    • density based algorithm / Density based
    • grid based algorithm / Grid based
  • models
    • non-linear models / Non-linear models
    • ensemble learning / Ensemble learning and meta learners
    • meta learners / Ensemble learning and meta learners
    • clustering analysis / Observations and clustering analysis
    • observations / Observations and clustering analysis
  • model validation techniques
    • about / Model validation techniques
    • prequential evaluation / Prequential evaluation
    • holdout evaluation / Holdout evaluation
    • controlled permutations / Controlled permutations
    • evaluation criteria / Evaluation criteria
    • algorithms, versus metrics / Comparing algorithms and metrics
  • most probable explanation (MPE) / MAP queries and marginal MAP queries
  • Multi-layered neural network
    • inputs / Inputs, neurons, activation function, and mathematical notation
    • neuron / Inputs, neurons, activation function, and mathematical notation
    • activation function / Inputs, neurons, activation function, and mathematical notation
    • mathematical notation / Inputs, neurons, activation function, and mathematical notation
    • about / Multi-layered neural network
    • structure and mathematical notations / Structure and mathematical notations
    • activation functions / Activation functions in NN
    • training / Training neural network
  • multi-layered perceptron (MLP) / Feature relevance and analysis
  • Multi-layer feed-forward neural network
    • about / Multi-layer feed-forward neural network
  • multi-view SSL
    • about / Co-training SSL or multi-view SSL
  • Multidimensional Scaling (MDS)
    • about / Multidimensional Scaling (MDS)
    • inputs / Inputs and outputs
    • outputs / Inputs and outputs
    • working / How does it work?
    • advantages / Advantages and limitations
    • limitations / Advantages and limitations
  • multinomial distribution / Random variables, joint, and marginal distributions
  • multiple algorithms, comparing
    • ANOVA test / ANOVA test
    • Friedmans test / Friedman's test
  • multivariate feature analysis
    • about / Multivariate feature analysis
    • scatter plots / Multivariate feature analysis
    • ScatterPlot Matrix / Multivariate feature analysis
    • parallel plots / Multivariate feature analysis
  • multivariate feature selection
    • about / Multivariate feature selection
    • minimal redundancy maximal relevance (mRMR) / Minimal redundancy maximal relevance (mRMR)
    • correlation-based feature selection (CFS) / Correlation-based feature selection (CFS)

N

  • Naive Bayes
    • about / Naïve Bayes
    • algorithm input / Algorithm input and output
    • algorithm output / Algorithm input and output
    • working / How does it work?
    • advantages / Advantages and limitations
    • limitation / Advantages and limitations
  • Naive Bayes (NB) / Feature relevance and analysis
  • named entity recognition / Information extraction and named entity recognition
  • named entity recognition (NER)
    • about / Named entity recognition
    • Hidden Markov models / Hidden Markov models for NER
    • Maximum entropy Markov model (MEMM) / Maximum entropy Markov models for NER
  • natural language processing (NLP)
    • about / NLP, subfields, and tasks, Deep learning and NLP
    • text categorization / Text categorization
    • part-of-speech tagging (POS tagging) / Part-of-speech tagging (POS tagging)
    • text clustering / Text clustering
    • information extraction / Information extraction and named entity recognition
    • named entity recognition / Information extraction and named entity recognition
    • sentiment analysis / Sentiment analysis and opinion mining
    • opinion mining / Sentiment analysis and opinion mining
    • coreference resolution / Coreference resolution
    • Word sense disambiguation (WSD) / Word sense disambiguation
    • machine translation (MT) / Machine translation
    • semantic reasoning / Semantic reasoning and inferencing
    • inferencing / Semantic reasoning and inferencing
    • text summarization / Text summarization
    • question, automating / Automating question and answers
    • answers, automating / Automating question and answers
  • Nemenyi test / Comparing algorithms and metrics
  • Neo4J / Graph databases
  • Neo4j
    • about / Machine learning – tools and datasets
    • URL / Machine learning – tools and datasets
    • URL, for licensing / Machine learning – tools and datasets
  • neural network, training
    • about / Training neural network
    • empirical risk minimization / Empirical risk minimization
    • parameter initialization / Parameter initialization
    • loss function / Loss function
    • gradients / Gradients
    • feed forward and backpropagation / Feed forward and backpropagation, How does it work?
  • neural networks
    • limitations / Limitations of neural networks, Vanishing gradients, local optimum, and slow training
  • No Free Lunch Theorem (NFLT) / Model building
  • non-linear algorithms
    • about / Non-linear algorithms
    • Hoeffding Trees (HT) / Hoeffding trees or very fast decision trees (VFDT)
    • very fast decision trees (VFDT) / Hoeffding trees or very fast decision trees (VFDT)
  • non-linear models
    • about / Non-linear models
    • Decision Trees / Decision Trees
    • K-Nearest Neighbors (KNN) / K-Nearest Neighbors (KNN)
    • support vector machines (SVM) / Support vector machines (SVM)
  • non-negative Matrix factorization (NMF)
    • about / Non-negative matrix factorization (NMF)
    • input / Input and output
    • output / Input and output
    • working / How does it work?
    • advantages / Advantages and limitations
    • limitation / Advantages and limitations
  • Non-negative Matrix Factorization (NNMF) / Clustering techniques
  • nonlinear methods
    • about / Nonlinear methods
    • Kernel Principal Component Analysis (KPCA) / Kernel Principal Component Analysis (KPCA)
    • manifold learning / Manifold learning
  • normalization
    • about / Outliers
    • Min-Max Normalization / Outliers
    • Z-Score Normalization / Outliers
  • Normalized mutual information (NMI) / Normalized mutual information index
  • NoSQL
    • about / NoSQL
    • key-value databases / Key-value databases
    • document databases / Document databases
    • columnar databases / Columnar databases
    • graph databases / Graph databases
  • notations, supervised learning
    • about / Formal description and notation
    • instance / Formal description and notation
    • label / Formal description and notation
    • binary classification / Formal description and notation
    • regression / Formal description and notation
    • dataset / Formal description and notation
  • Noun Phrase (NP)
    • about / Syntactic features

O

  • one-class SVM
    • about / One-class SVM
    • inputs / Inputs and outputs
    • outputs / Inputs and outputs
    • working / How does it work?
    • advantages / Advantages and limitations
    • limitations / Advantages and limitations
  • online bagging algorithm
    • about / Online Bagging algorithm
    • input / Inputs and outputs
    • output / Inputs and outputs
    • working / How does it work?
    • advantages / Advantages and limitations
    • limitations / Advantages and limitations
  • online boosting algorithm
    • about / Online Boosting algorithm
    • input / Inputs and outputs
    • output / Inputs and outputs
    • working / How does it work?
    • advantages / Advantages and limitations
    • limitations / Advantages and limitations
  • Online k-Means
    • inputs / Inputs and outputs
    • outputs / Inputs and outputs
    • working / How does it work?
    • advantages / Advantages and limitations
    • limitations / Advantages and limitations
  • online linear models, with loss functions
    • inputs / Inputs and outputs
    • ouputs / Inputs and outputs
    • working / How does it work?
    • advantages / Advantages and limitations
    • limitataions / Advantages and limitations
  • Online Naive Bayes
    • about / Online Naïve Bayes
    • inputs / Inputs and outputs
    • outputs / Inputs and outputs
    • working / How does it work?
    • advantages / Advantages and limitations
    • limitations / Advantages and limitations
  • OpenMarkov
    • about / Machine learning – tools and datasets
    • URL / Machine learning – tools and datasets
    / OpenMarkov
  • opinion mining / Sentiment analysis and opinion mining
  • OrientDB / Graph databases
  • outlier algorithms
    • about / Outlier algorithms
    • statistical-based / Outlier algorithms, Statistical-based
    • distance-based / Outlier algorithms
    • density-based / Outlier algorithms
    • clustering-based / Outlier algorithms
    • high-dimension-based / Outlier algorithms
    • distance-based methods / Distance-based methods
    • density-based methods / Density-based methods
    • clustering-based methods / Clustering-based methods
    • high-dimensional-based methods / High-dimensional-based methods
    • one-class SVM / One-class SVM
  • outlier detection
    • about / Outlier or anomaly detection
    • outlier algorithms / Outlier algorithms
    • outlier evaluation techniques / Outlier evaluation techniques
    • used, for unsupervised learning / Unsupervised learning using outlier detection
    • partition-based clustering / Partition-based clustering for outlier detection
    • input / Inputs and outputs, Inputs and outputs
    • output / Inputs and outputs, Inputs and outputs
    • working / How does it work?, How does it work?
    • advantages / Advantages and limitations, Advantages and limitations
    • limitations / Advantages and limitations, Advantages and limitations
    • Exact Storm / Exact Storm
    • Abstract-C / Abstract-C
    • Direct Update of Events (DUE) / Direct Update of Events (DUE)
    • Micro Clustering based Algorithm (MCOD) / Micro Clustering based Algorithm (MCOD)
    • Approx Storm / Approx Storm
    • validation techniques / Validation and evaluation techniques
    • evaluation techniques / Validation and evaluation techniques
  • outlier evaluation techniques
    • about / Outlier evaluation techniques
    • supervised evaluation / Supervised evaluation
    • unsupervised evaluation / Unsupervised evaluation
    • technique / Unsupervised evaluation
  • outlier models
    • observation / Observations and analysis
    • analysis / Observations and analysis
  • outliers
    • handling / Outliers
    • detecting, in data / Outliers
    • IQR / Outliers
    • distance-based methods / Outliers
    • density-based methods / Outliers
    • mathematical transformation, of feature / Outliers
    • handling, robust statistical algorithms used / Outliers
  • oversampling / Undersampling and oversampling

P

  • Page-Hinckley test / CUSUM and Page-Hinckley test
  • Paired-t test / Paired-t test
  • pairwise-adaptive similarity / Pairwise-adaptive similarity
  • parallel plots / Multivariate feature analysis
  • Parquet / Columnar databases
  • part-of-speech tagging (POS tagging) / Part-of-speech tagging (POS tagging)
  • partial memory
    • about / Partial memory
    • full memory / Full memory
    • detection methods / Detection methods
    • adaptation methods / Adaptation methods
  • partition based algorithm
    • Online k-Means / Online k-Means
  • peristalsis / Visualization analysis
  • Phrase (VP)
    • about / Syntactic features
  • Pipelines
    • reference link / Random Forest
  • Poisson distribution / Poisson distribution
  • Polynomial Kernel / How does it work?
  • positive definite matrix / Positive definite matrix
  • positive semi-definite matrix / Positive definite matrix
  • PRC curve / ROC and PRC curves
  • Prepositional Phrase (PP)
    • about / Syntactic features
  • principal component analysis (PCA)
    • about / Principal component analysis (PCA)
    • inputs / Inputs and outputs
    • output / Inputs and outputs
    • working / How does it work?
    • advanatges / Advantages and limitations
    • limitations / Advantages and limitations
    / Dimensionality reduction
  • Principal Component Analysis (PCA) / Embedded approach
  • principal components / How does it work?
  • probabilistic graphical models (PGM) / Machine learning – tools and datasets
  • probabilistic graph modeling / Machine learning – types and subtypes
  • probabilistic latent semantic analysis (PLSA)
    • about / Probabilistic latent semantic analysis (PLSA)
    • input / Input and output
    • output / Input and output
    • working / How does it work?
    • advantages / Advantages and limitations
    • limitations / Advantages and limitations
  • probabilistic latent semantic index (PLSI) / Topic modeling
  • Probabilistic Principal Component Analysis (PPCA) / Advantages and limitations
  • probability
    • about / Probability revisited
    • concepts / Concepts in probability
    • conditional probability / Conditional probability
    • Chain rule and Bayes' theorem / Chain rule and Bayes' theorem
    • random variables / Random variables, joint, and marginal distributions
    • joint / Random variables, joint, and marginal distributions
    • marginal distributions / Random variables, joint, and marginal distributions
    • marginal independence / Marginal independence and conditional independence
    • conditional independence / Marginal independence and conditional independence
    • factors / Factors
    • factors, types / Factor types
    • distribution queries / Distribution queries
    • probabilistic queries / Probabilistic queries
    • MAP queries / MAP queries and marginal MAP queries
    • marginal MAP queries / MAP queries and marginal MAP queries
  • process, machine learning
    • about / Process
    • business problem, identifying / Process
    • mapping / Process
    • data collection / Process
    • data quality analysis / Process
    • data sampling / Process
    • transformation / Process
    • feature analysis / Process
    • feature selection / Process
    • modeling / Process
    • model evaluation / Process
    • model selection / Process
    • model deployment / Process
    • model performance, monitoring / Process
  • processors / SAMOA architecture
  • Propagation-based techniques, Bayesian networks
    • about / Propagation-based techniques
    • belief propagation / Belief propagation
    • factor graph / Factor graph
    • factor graph, messaging in / Messaging in factor graph
    • input and output / Input and output
    • working / How does it work?
    • advantages and limitations / Advantages and limitations
  • publish-subscribe frameworks / Publish-subscribe frameworks

Q

  • Query by Committee (QBC)
    • about / Query by Committee (QBC)
  • Query by disagreement (QBD)
    • about / Query by disagreement (QBD)

R

  • R-Squared / R-Squared
  • Radial Basis Function (RBF) / Inputs and outputs
  • Rand index / Rand index
  • Random Forest / Random Forest
  • Random Forest (RF) / Feature relevance and analysis, Random Forest
  • random projections (RP)
    • about / Random projections (RP)
    • inputs / Inputs and outputs
    • outputs / Inputs and outputs
    • working / How does it work?
    • advantages / Advantages and limitations
    • limitations / Advantages and limitations
  • RapidMiner
    • about / Machine learning – tools and datasets, Case Study – Horse Colic Classification
    • URL / Machine learning – tools and datasets
    • experiments / RapidMiner experiments
    • visualization analysis / Visualization analysis
    • feature selection / Feature selection
    • model process flow / Model process flow
    • model evaluation metrics / Model evaluation metrics
  • real-time Big Data Machine Learning
    • about / Real-time Big Data Machine Learning
    • SAMOA / SAMOA as a real-time Big Data Machine Learning framework
    • machine learning algorithms / Machine Learning algorithms
    • tools / Tools and usage
    • usage / Tools and usage
    • experiments / Experiments, results, and analysis
    • results / Experiments, results, and analysis
    • analysis / Experiments, results, and analysis
    • results, analysis / Analysis of results
  • real-time stream processing / Real-time stream processing
  • real-world case study
    • about / Real-world case study
    • tools / Tools and software
    • software / Tools and software
    • business problem / Business problem
    • machine learning, mapping / Machine learning mapping
    • data collection / Data collection
    • data quality analysis / Data quality analysis
    • data sampling / Data sampling and transformation
    • data transformation / Data sampling and transformation
    • feature analysis / Feature analysis and dimensionality reduction
    • dimensionality reduction / Feature analysis and dimensionality reduction
    • models, clustering / Clustering models, results, and evaluation
    • results / Clustering models, results, and evaluation, Outlier models, results, and evaluation
    • evaluation / Clustering models, results, and evaluation, Outlier models, results, and evaluation
    • outlier models / Outlier models, results, and evaluation
  • reasoning, Bayesian networks
    • patterns / Reasoning patterns
    • causal or predictive reasoning / Causal or predictive reasoning
    • evidential or diagnostic reasoning / Evidential or diagnostic reasoning
    • intercausal reasoning / Intercausal reasoning
    • combined reasoning / Combined reasoning
  • receiver operating characteristics (ROC) / Machine learning – concepts and terminology
  • Recurrent neural networks (RNN)
    • about / Recurrent Neural Networks
    • structure / Structure of Recurrent Neural Networks
    • learning / Learning and associated problems in RNNs
    • issues / Learning and associated problems in RNNs
    • Long short term memory (LSTM) / Long Short Term Memory
    • Gated Recurrent Units (GRUs) / Gated Recurrent Units
  • regression
    • about / Formal description and notation
  • regularization
    • about / Regularization
    • L2 regularization / L2 regularization
    • L1 regularization / L1 regularization
  • reinforcement learning / Machine learning – types and subtypes
  • representation, Bayesian networks
    • about / Representation
    • definition / Definition
  • resampling / Is sampling needed?
  • Resilient Distributed Datasets (RDD) / Spark architecture
  • Restricted Boltzmann Machines (RBM)
    • about / Restricted Boltzmann Machines
    • definition and mathematical notation / Definition and mathematical notation
    • Conditional distribution / Conditional distribution
    • free energy / Free energy in RBM
    • training / Training the RBM
    • sampling / Sampling in RBM
    • contrastive divergence / Contrastive divergence , How does it work?
    • persistent contrastive divergence / Persistent contrastive divergence
  • ROC curve / ROC and PRC curves
  • roles, machine learning
    • about / Roles
    • business domain expert / Roles
    • data engineer / Roles
    • project manager / Roles
    • data scientist / Roles
    • machine learning expert / Roles

S

  • SAMOA
    • about / Machine learning – tools and datasets, SAMOA as a real-time Big Data Machine Learning framework
    • URL / Machine learning – tools and datasets
    • architecture / SAMOA architecture
  • sampling
    • about / Machine learning – concepts and terminology, Sampling
    • uniform random sampling / Machine learning – concepts and terminology
    • stratified random sampling / Machine learning – concepts and terminology
    • cluster sampling / Machine learning – concepts and terminology
    • systematic sampling / Machine learning – concepts and terminology
  • sampling-based techniques, Bayesian networks
    • about / Sampling-based techniques
    • forward sampling with rejection / Forward sampling with rejection, How does it work?
  • Samza / SAMOA as a real-time Big Data Machine Learning framework
  • scalar product
    • of vectors / Scalar product of vectors
  • ScatterPlot Matrix / Multivariate feature analysis
  • scatter plots / Multivariate feature analysis
  • self-organizing maps (SOM)
    • about / Self-organizing maps (SOM)
    • inputs / Inputs and outputs
    • output / Inputs and outputs
    • working / How does it work?
    • advantages / Advantages and limitations
    • limitations / Advantages and limitations
  • self-training SSL
    • about / Self-training SSL
    • inputs / Inputs and outputs
    • outputs / Inputs and outputs
    • working / How does it work?
    • advantages / Advantages and limitations
    • limitations / Advantages and limitations
    • Query by Committee (QBC) / Query by Committee (QBC)
  • semantic features / Semantic features
  • semantic reasoning / Semantic reasoning and inferencing
  • semi-supervised learning / Machine learning – types and subtypes
  • Semi-Supervised Learning (SSL)
    • about / Semi-supervised learning
    • representation / Representation, notation, and assumptions
    • notation / Representation, notation, and assumptions
    • assumptions / Representation, notation, and assumptions
    • assumptions, to be true / Representation, notation, and assumptions
    • techniques / Semi-supervised learning techniques
    • self-training SSL / Self-training SSL
    • multi-view SSL / Co-training SSL or multi-view SSL
    • co-training SSL / Co-training SSL or multi-view SSL
    • label SSL / Cluster and label SSL
    • cluster SSL / Cluster and label SSL
    • transductive graph label propagation / Transductive graph label propagation
    • transductive SVM (TSVM) / Transductive SVM (TSVM)
    • advanatages / Advantages and limitations
    • disadvanatages / Advantages and limitations
    • data distribution sampling / Data distribution sampling
  • Semi-Supervised Learning (SSL), case study
    • about / Case study in semi-supervised learning
    • tools / Tools and software
    • software / Tools and software
    • business problem / Business problem
    • machine learning, mapping / Machine learning mapping
    • data collection / Data collection
    • data quality, analysis / Data quality analysis
    • data sampling / Data sampling and transformation
    • data transformation / Data sampling and transformation
    • datasets / Datasets and analysis
    • datasets, analysis / Datasets and analysis
    • feature analysis, results / Feature analysis results
    • experiments / Experiments and results
    • results / Experiments and results
    • analysis / Analysis of semi-supervised learning
  • sentiment analysis / Sentiment analysis and opinion mining
  • sequential data / Datasets used in machine learning
  • shrinking methods
    • embedded approach / Embedded approach
  • Sigmoid function / Sigmoid function
  • Sigmoid Kernel / How does it work?
  • Silhouettes index / Silhouette's index
  • similarity measures
    • about / Similarity measures
    • Euclidean distance / Euclidean distance
    • Cosine distance / Cosine distance
    • pairwise-adaptive similarity / Pairwise-adaptive similarity
    • extended Jaccard Coefficient / Extended Jaccard coefficient
    • Dice coefficient / Dice coefficient
  • singular value decomposition (SVD) / Dimensionality reduction, Singular value decomposition (SVD)
  • Singular Value Decomposition (SVD) / Advantages and limitations
  • sliding windows
    • about / Sliding windows
  • SMILE
    • reference link / Tools and software
  • Smile
    • URL / Machine learning – tools and datasets
    • about / Machine learning – tools and datasets
  • software / Tools and software
  • source-sink frameworks / Source-sink frameworks
  • Spark-MLlib
    • about / Machine learning – tools and datasets
    • URL / Machine learning – tools and datasets
  • Spark core, components
    • Resilient Distributed Datasets (RDD) / Spark architecture
    • Lineage graph / Spark architecture
  • Spark MLlib
    • used, as Big Data Machine Learning / Spark MLlib as Big Data Machine Learning platform
    • architecture / Spark architecture
    • machine learning / Machine Learning in MLlib
    • tools / Tools and usage
    • usage / Tools and usage
    • experiments / Experiments, results, and analysis
    • results / Experiments, results, and analysis
    • analysis / Experiments, results, and analysis
    • reference link / Experiments, results, and analysis
    • k-Means / k-Means
    • k-Means, with PCA / k-Means with PCA
    • k-Means with PCA, bisecting / Bisecting k-Means (with PCA)
    • Gaussian Mixture Model (GMM) / Gaussian Mixture Model
    • Random Forest / Random Forest
    • results, analysis / Analysis of results
  • Spark SQL / Spark SQL
  • Spark Streaming
    • about / Real-time Big Data Machine Learning
  • sparse coding
    • about / Sparse coding
  • spectral clustering
    • about / Spectral clustering
    • input / Inputs and outputs
    • output / Inputs and outputs
    • working / How does it work?
    • advantages / Advantages and limitations
    • limitations / Advantages and limitations
  • SQL frameworks / SQL frameworks
  • standard deviation / Standard deviation
  • standardization
    • about / Document collection and standardization
    • input / Inputs and outputs
    • output / Inputs and outputs
    • working / How does it work?
  • Statistical-based
    • about / Statistical-based
    • input / Inputs and outputs
    • outputs / Inputs and outputs
    • working / How does it work?
    • advantages / Advantages and limitations
    • limitations / Advantages and limitations
  • stemming / Stemming or lemmatization
  • step execution mode / Amazon Elastic MapReduce
  • Stochastic Gradient Descent (SGD)
    • about / How does it work?
    / Supervised learning experiments
  • stop words removal
    • about / Stop words removal
    • input / Inputs and outputs
    • output / Inputs and outputs
    • working / How does it work?
  • stratified sampling / Stratified sampling
  • stream / SAMOA architecture
  • stream computational technique
    • about / Basic stream processing and computational techniques, Stream computations
    • frequency count / Stream computations
    • point queries / Stream computations
    • distinct count / Stream computations
    • mean / Stream computations
    • standard deviation / Stream computations
    • correlation coefficient / Stream computations
    • sliding windows / Sliding windows
    • sampling / Sampling
  • stream learning / Machine learning – types and subtypes
  • stream learning, case study
    • about / Case study in stream learning
    • tools / Tools and software
    • software / Tools and software
    • business problem / Business problem
    • machine learning, mapping / Machine learning mapping
    • data collection / Data collection
    • data sampling / Data sampling and transformation
    • data transformation / Data sampling and transformation
    • feature analysis / Feature analysis and dimensionality reduction
    • dimensionality reduction / Feature analysis and dimensionality reduction
    • models / Models, results, and evaluation
    • results / Models, results, and evaluation
    • evaluation / Models, results, and evaluation
    • supervised learning experiments / Supervised learning experiments
    • concept drift experiments / Concept drift experiments
    • clustering experiments / Clustering experiments
    • outlier detection experiments / Outlier detection experiments
    • results, analysis / Analysis of stream learning results
  • Stream Processing Engines (SPE) / Real-time stream processing
  • stream processing technique
    • about / Basic stream processing and computational techniques
  • structured data
    • sequential data / Datasets used in machine learning
  • Structure Score Measure / Measures to evaluate structures
  • subfields
    • about / NLP, subfields, and tasks
  • Subspace Outlier Detection (SOD) / How does it work?
  • Sum of Squared Errors (SSE) / Clustering models, results, and evaluation, Experiments, results, and analysis
  • supervised learning / Machine learning – types and subtypes
    • experiments / Supervised learning experiments
    • Weka, experiments / Weka experiments
    • RapidMiner, experiments / RapidMiner experiments
    • reference link / Results, observations, and analysis
    • and unsupervised learning, common issues / Issues in common with supervised learning
    • assumptions / Assumptions and mathematical notations
    • mathematical notations / Assumptions and mathematical notations
  • Support Vector Machines (SVM) / How does it work?
  • support vector machines (SVM)
    • about / Support vector machines (SVM)
    • algorithm input / Algorithm inputs and outputs
    • algorithm output / Algorithm inputs and outputs
    • working / How does it work?
    • advantages / Advantages and limitations
    • limitations / Advantages and limitations
  • Syntactic features
    • about / Syntactic features
  • Syntactic Language Models (SLM)
    • about / Syntactic features
  • Synthetic Minority Oversampling Technique (SMOTE) / Undersampling and oversampling

T

  • tasks
    • about / NLP, subfields, and tasks
  • Term Frequency (TF) / Term frequency (TF)
  • term frequency (TF) / Frequency-based techniques
  • term frequency-inverse document frequency (TF-IDF) / Term frequency-inverse document frequency (TF-IDF)
  • text categorization
    • about / Text categorization
  • text clustering / Text clustering
    • about / Text clustering
    • feature transformation / Feature transformation, selection, and reduction
    • selection / Feature transformation, selection, and reduction
    • reduction / Feature transformation, selection, and reduction
    • techniques / Clustering techniques
    • evaluation / Evaluation of text clustering
  • text mining
    • topics / Topics in text mining
    • categorization/classification / Text categorization/classification
    • topic modeling / Topic modeling
    • clustering / Text clustering
    • named entity recognition (NER) / Named entity recognition
    • Deep Learning / Deep learning and NLP
    • NLP / Deep learning and NLP
  • text processing components
    • about / Text processing components and transformations
    • document collection / Document collection and standardization
    • standardization / Document collection and standardization
    • tokenization / Tokenization
    • stop words removal / Stop words removal
    • lemmatization / Stemming or lemmatization
    • local-global dictionary / Local/global dictionary or vocabulary?
    • vocabulary / Local/global dictionary or vocabulary?
    • feature extraction/generation / Feature extraction/generation
    • feature representation / Feature representation and similarity
    • similarity / Feature representation and similarity
    • feature selection / Feature selection and dimensionality reduction
    • dimensionality reduction / Feature selection and dimensionality reduction
  • text summarization / Text summarization
  • time-series forecasting / Machine learning – types and subtypes
  • tokenization
    • about / Tokenization
    • input / Inputs and outputs
    • output / Inputs and outputs
    • working / How does it work?
  • tools / Tools and software
    • about / Tools and usage
    • Mallet / Mallet
    • KNIME / KNIME
  • tools, machine learning
    • RapidMiner / Machine learning – tools and datasets
    • Weka / Machine learning – tools and datasets
    • Knime / Machine learning – tools and datasets
    • Mallet / Machine learning – tools and datasets
    • Elki / Machine learning – tools and datasets
    • JCLAL / Machine learning – tools and datasets
    • KEEL / Machine learning – tools and datasets
    • DeepLearning4J / Machine learning – tools and datasets
    • Spark-MLlib / Machine learning – tools and datasets
    • H2O / Machine learning – tools and datasets
    • MOA/SAMOA / Machine learning – tools and datasets
    • Neo4j / Machine learning – tools and datasets
    • GraphX / Machine learning – tools and datasets
    • OpenMarkov / Machine learning – tools and datasets
    • Smile / Machine learning – tools and datasets
  • topic modeling
    • about / Topic modeling
    • probabilistic latent semantic analysis (PLSA) / Probabilistic latent semantic analysis (PLSA)
    • with mallet / Topic modeling with mallet
    • business problem / Business problem
    • machine learning, mapping / Machine Learning mapping
    • data collection / Data collection
    • data sampling / Data sampling and transformation
    • transformation / Data sampling and transformation
    • feature analysis / Feature analysis and dimensionality reduction
    • dimensionality reduction / Feature analysis and dimensionality reduction
    • models / Models, results, and evaluation
    • results / Models, results, and evaluation
    • evaluation / Models, results, and evaluation
    • text processing results, analysis / Analysis of text processing results
  • training phases
    • competitive phase / How does it work?
    • cooperation phase / How does it work?
    • adaptive phase / How does it work?
  • transaction data / Datasets used in machine learning
  • transductive graph label propagation
    • about / Transductive graph label propagation
    • input / Inputs and outputs
    • output / Inputs and outputs
    • working / How does it work?
    • advantages / Advantages and limitations
    • limitations / Advantages and limitations
  • transductive SVM (TSVM)
    • about / Transductive SVM (TSVM)
    • output / Inputs and outputs
    • input / Inputs and outputs
    • working / How does it work?
    • advantages / Advantages and limitations
    • limitations / Advantages and limitations
  • transformations
    • about / Text processing components and transformations
  • Tree augmented network (TAN)
    • about / Tree augmented network
    • input and output / Input and output
    • working / How does it work?
    • advantages and limitations / Advantages and limitations
  • Tunedit
    • about / Datasets
    • URL / Datasets

U

  • UCI repository
    • reference link / Data Collection
  • UC Irvine (UCI) database
    • about / Datasets
    • URL / Datasets
  • uncertainty sampling
    • about / Uncertainty sampling
    • working / How does it work?
    • least confident sampling / Least confident sampling
    • smallest margin sampling / Smallest margin sampling
    • label entropy sampling / Label entropy sampling
    • advantages / Advantages and limitations
    • limitations / Advantages and limitations
  • undersampling / Undersampling and oversampling
  • univariate feature analysis
    • about / Univariate feature analysis
    • categorical features / Categorical features
    • continuous features / Continuous features
  • univariate feature selection
    • information theoretic approach / Information theoretic approach
    • statistical approach / Statistical approach
  • unnormalized measure / Factor types
  • unstructured data / Datasets used in machine learning
    • mining, issues / Issues with mining unstructured data
  • unsupervised learning / Machine learning – types and subtypes
    • specific issues / Issues specific to unsupervised learning
    • assumptions / Assumptions and mathematical notations
    • mathematical notations / Assumptions and mathematical notations
    • outlier detection, used / Unsupervised learning using outlier detection
  • usage / Tools and usage
  • US Forest Service (USFS) / Data collection
  • US Geological Survey (USGS) / Data collection

V

  • V-Measure
    • about / V-Measure
    • Homogeneity / V-Measure
    • Completeness / V-Measure
  • validation
    • techniques / Training, validation, and test set
  • Variable elimination (VE) algorithm / Variable elimination algorithm
  • variance / Variance
  • vector
    • about / Vector
    • scalar product / Scalar product of vectors
  • vector space model (VSM)
    • about / Vector space model
    • binary / Binary
    • Term Frequency (TF) / Term frequency (TF)
    • inverse document frequency (IDF) / Inverse document frequency (IDF)
    • term frequency-inverse document frequency (TF-IDF) / Term frequency-inverse document frequency (TF-IDF)
  • version space sampling
    • about / Version space sampling
    • Query by disagreement (QBD) / Query by disagreement (QBD)
  • very fast decision trees (VFDT) / Hoeffding trees or very fast decision trees (VFDT)
    • output / Inputs and outputs
    • advantages / Advantages and limitations
    • limitations / Advantages and limitations
  • Very Fast K-means Algorithm (VFKM) / Advantages and limitations
  • visualization analysis
    • about / Visualization analysis
    • univariate feature analysis / Univariate feature analysis
    • multivariate feature analysis / Multivariate feature analysis
  • Vote Entropy
    • disadvanatages / How does it work?

W

  • weighted linear sum (WLS) / How does it work?
  • weighted linear sum of squares (WSS) / How does it work?
  • weighted majority algorithm (WMA)
    • about / Weighted majority algorithm
    • input / Inputs and outputs
    • output / Inputs and outputs
    • working / Advantages and limitations
    • advantages / Advantages and limitations
    • limitations / Advantages and limitations
  • Weka
    • URL / Machine learning – tools and datasets
    • about / Machine learning – tools and datasets, Case Study – Horse Colic Classification
    • experiments / Weka experiments
    • Sample end-to-end process, in Java / Sample end-to-end process in Java
    • experimenter / Weka experimenter and model selection
    • model selection / Weka experimenter and model selection
  • Weka Bayesian Network GUI / Weka Bayesian Network GUI
  • Welchs test / Welch's t test
  • Widmer / Widmer and Kubat
  • Wilcoxon signed-rank test / Wilcoxon signed-rank test
  • Word sense disambiguation (WSD) / Word sense disambiguation
  • wrapper approach / Wrapper approach

Z

  • Z-Score Normalization / Outliers
  • ZeroMQ Message Transfer Protocol (ZMTP) / Message queueing frameworks