Index
A
- activation function / Activation function
- Akaike information criterion (AIC) / ARIMA model
- ambient pressure (AP) / Wright's model
- ambient temperature (AT) / Wright's model
- anomalies
- point anomalies / Anomaly detection
- contextual anomalies / Anomaly detection
- collective anomalies / Anomaly detection
- anomaly detection / Anomaly detection
- antecedent / Association rules
- Apriori algorithm
- about / Apriori algorithm
- association rules, finding / Finding association rules
- AR(1) / Autoregression
- AR(k) / Autoregression
- area under curve (AUC) / H-measure
- area under ROC (AUROC) / Area under ROC
- AR model
- association rules
- about / Association rules
- finding / Finding association rules
- augmented Dickey-Fuller test / Detection of stationarity
- autocorrelation / Autocorrelation
- autocorrelation plot / Detection of white noise in a series
- autoregressive (AR) model / Autoregression, Detection of stationarity
- autoregressive integrated moving average (ARIMA) model / Autoregressive integrated moving average
- autoregressive operator / Autoregressive integrated moving average
B
- backpropagation / Backpropagation
- backward propagation / Backward propagation, Forward and backward propagation, Backward propagation
- backward propagation equation / Backward propagation equation
- bagging (Bootstrap aggregation) / Bagging
- bag of words model / Bags of words
- Bayesian multiple imputation / Bayesian multiple imputation
- Bayes network
- about / Bayes network
- nodes, probabilities / Probabilities of nodes
- conditional probability table (CPT) / CPT
- training set / Example of the training and test set
- test set / Example of the training and test set
- Bayes rule / Bayes rule
- Bayes theorem
- about / The Bayes theorem
- Naive Bayes classifier, working / How the Naive Bayes classifier works
- best-matching unit (BMU) / SOM
- bias / Bias-variance trade off
- bias-variance trade off / Bias-variance trade off
- bias initialization
- about / Bias initialization
- hyperparameters / Hyperparameters
- digit recognizer / Use case – digit recognizer
- boosting
- about / Boosting
- gradient boosting / Gradient boosting
- bootstrapping
- 0.632 rule / 0.632 rule in bootstrapping
- about / Bootstrapping, Bagging
- branch / Decision tree
- branches, machine learning
- supervised learning / Machine learning
- unsupervised learning / Machine learning
- reinforcement learning / Machine learning
C
- Capsule Network (CapsNet) / Hinton's Capsule network, The Capsule Network and convolutional neural networks
- Classification and regression trees (CART) / Tree splitting
- collective anomalies / Anomaly detection
- compressed sensing
- about / Compressed sensing
- goal / Our goal
- computation, neural networks
- activation for H1,calculating / Calculation of activation for H1
- conditional probability / Key concepts
- conditional probability table (CPT) / Bayes network, CPT
- confidence / Association rules
- confusion matrix
- about / Confusion matrix
- True Positive (TP) / Confusion matrix
- True Negative (TN) / Confusion matrix
- False Positive (FP) / Confusion matrix
- False Negative (FN) / Confusion matrix
- contextual anomalies / Anomaly detection
- convolutional neural network (CNN) / The Capsule Network and convolutional neural networks
- corpora / Text corpus
- count vectorizer
- executing / Executing the count vectorizer
- cross-validation
- about / Cross-validation and model selection
- used, for model selection / Model selection using cross-validation
- curve fitting
- about / Curve fitting
- residual / Residual
D
- data sets, model building
- about / Training data development data – test data
- training set / Training data development data – test data
- development set / Training data development data – test data
- test set / Training data development data – test data
- Decision Node / Decision tree
- decision tree
- about / Decision tree
- Root Node / Decision tree
- Decision Node / Decision tree
- Leaf Node / Decision tree
- branch / Decision tree
- tree splitting / Tree splitting
- deep learning model
- need for / Why do we need a deep learning model?
- deep neural network
- about / Deep neural networks
- notation / Deep neural network notation
- forward propagation / Forward propagation in a deep network, Forward and backward propagation
- parameter W / Parameters W and b
- parameter b / Parameters W and b
- backward propagation / Forward and backward propagation
- error computation / Error computation
- development set
- digit recognizer / Use case – digit recognizer
- dimensionality reduction / Dimensionality reduction
- directed acyclic graph / Bayes network
- discriminator / Generative adversarial networks
- dot product / Dot product
- dynamic routing between capsules / Hinton's Capsule network
E
- electrical energy output (PE) / Wright's model
- elements, association rules
- antecedent (if) / Association rules
- consequent (then) / Association rules
- elements, autoregressive integrated moving average (ARIMA) model
- autoregressive operator / Autoregressive integrated moving average
- integration operator / Autoregressive integrated moving average
- moving average operator / Autoregressive integrated moving average
- elements, backpropagation
- dataset / Backpropagation
- feed-forward network / Backpropagation
- loss function / Backpropagation
- ensemble learning / What is ensemble learning?
- ensemble model
- about / Ensemble methods
- building, methods / Ensemble methods
- entropy / Tree splitting
- error computation / Error computation
- errors, bias-variance trade off
- training error / Bias-variance trade off
- development error / Bias-variance trade off
F
- F-test
- about / F-test
- limitations / Limitations
- use case / Use case
- False Negative (FN) / Confusion matrix
- False Positive (FP) / Confusion matrix
- fine needle aspirate (FNA) / Case study
- first differenced / Autoregressive integrated moving average
- forward propagation
- in deep neural network / Forward propagation in a deep network
- about / Forward and backward propagation
- forward propagation equation / Forward propagation equation
- frequent itemset generation / Apriori algorithm
- frequent pattern growth (FP-growth)
- about / Frequent pattern growth
- transactions, setting up / Frequent pattern growth
- frequency, finding / Frequent pattern growth
- items, prioritizing by frequency / Frequent pattern growth
- items. ordering by priority / Frequent pattern growth
- validation / Validation
- frequent pattern tree growth / Frequent pattern tree growth
G
- Gaussian kernel / Gaussian kernel
- Gaussian white noise / White noise
- generative adversarial networks (GANs) / Generative adversarial networks
- generator / Generative adversarial networks
- Gini index / Tree splitting
- gradient boosting
- about / Gradient boosting
- parameters / Parameters of gradient boosting
- Granger causality / Granger causality
- graphical causal models / Graphical causal models
- grid search
- SVM example optimization / SVM example and parameter optimization through grid search
- parameters optimization / SVM example and parameter optimization through grid search
H
- H-measure / H-measure
- Hinton's capsule network / Hinton's Capsule network
- hold out set / Training data development data – test data
- hyperbolic tangent function (tanh) / Types of activation functions
- hyperparameters
- about / Parameters and hyperparameters, Hyperparameters
- learning rate / Hyperparameters
- epoch / Hyperparameters
- number of hidden layers / Hyperparameters
- number of nodes / Hyperparameters
- dropout / Hyperparameters
- momentum / Hyperparameters
- batch size / Hyperparameters
- hyperplanes / Hyperplanes
- hypothesis testing
- about / Detection of stationarity
- rules / Detection of stationarity
I
- ICA preprocessing
- about / Preprocessing for ICA
- centering / Preprocessing for ICA
- whitening / Preprocessing for ICA
- improvement curve / Learning curve
- Independent component analysis (ICA)
- about / Independent component analysis
- approach / Approach
- input layer neurons / Neural networks
- integration operator / Autoregressive integrated moving average
K
- k-fold cross-validation / K-fold cross-validation
- Kernel / Kernel
- kernel method / Introduction
- Kernel PCA / Kernel PCA
- Kernel trick / Kernel trick, Back to Kernel trick
- Kohonen maps / Self-organizing maps
L
- latent dirichlet allocation (LDA)
- about / Topic modeling
- used, for topic modeling / Topic modeling
- evaluation / Evaluating the model
- visualization / Visualizing the LDA
- latent semantic analysis (LSA)
- used, for topic modeling / Topic modeling
- LDA architecture / LDA architecture
- Leaf Node / Decision tree
- Leaky ReLU / Overcoming vanishing gradient
- learning / Machine learning
- learning curve
- about / Learning curve
- machine learning / Machine learning
- Wright's model / Wright's model
- least absolute shrinkage and selection operator (LASSO) / Least absolute shrinkage and selection operator
- likelihood / Bayes rule
- linear discriminant function / SVM
- linear kernel / Linear kernel
- linear separability / Linear separability
- Long Short-Term Memory (LSTM) / Limitations of RNNs
- loss function / Loss function
M
- machine learning / Machine learning
- marginal likelihood / Bayes rule
- methods, ensemble model building
- bootstrapping / Bootstrapping
- methods, stationarity detection
- data, plotting / Detection of stationarity
- data set, dividing / Detection of stationarity
- summary, computing / Detection of stationarity
- augmented Dickey-Fuller test / Detection of stationarity
- metrics, association rules
- support / Association rules
- confidence / Association rules
- lift / Association rules
- model evaluation
- about / Model evaluation, Confusion matrix
- confusion matrix / Confusion matrix
- model initialization / Model initialization
- model selection
- about / Cross-validation and model selection
- with cross-validation / Model selection using cross-validation
- Modified National Institute of Standards and Technology (MNIST) / Use case – digit recognizer
- moving average model (MA) / Moving average model
- moving average operator / Autoregressive integrated moving average
N
- Naive Bayes classifier
- working / How the Naive Bayes classifier works
- network initialization
- about / Network initialization
- zero Initialization / Network initialization
- random initialization / Network initialization
- He-et-al Initialization / Network initialization
- neural networks
- about / Neural networks
- working / How a neural network works
- model initialization / Model initialization
- loss function / Loss function
- optimization / Optimization
- computation / Computation in neural networks
- backward propagation / Backward propagation
- activation function / Activation function
- overfitting, preventing / Prevention of overfitting in NNs
- non-negative matrix factorization
- used, for topic modeling / Topic modeling
- norm / Magnitude of the vector
- notation / Deep neural network notation
- null hypothesis / Detection of stationarity
O
- optimization / Optimization
- ordinary least square (OLS) / Ridge regression (L2)
- out-of-bag (OOB) sample / Bootstrapping
- outlier / Anomaly detection
- overall accuracy / Receiver operating characteristic curve
- overfitting
- about / Bias-variance trade off, Overfitting
- preventing, in NN / Prevention of overfitting in NNs
P
- parameters / Parameters and hyperparameters
- parameters, tree splitting
- Max_depth / Parameters of tree splitting
- min_samples_split / Parameters of tree splitting
- min_samples_leaf / Parameters of tree splitting
- max_features / Parameters of tree splitting
- parameters optimization
- through grid search / SVM example and parameter optimization through grid search
- about / Optimization of parameters
- of AR model / AR model
- of ARIMA model / ARIMA model
- parameters optimization / ARIMA model
- partial auto correlation functional (PACF) plot / Autoregressive integrated moving average
- point anomalies / Anomaly detection
- polynomial kernel / Polynomial kernel
- posterior / Bayes rule
- principal component analysis (PCA) / Introduction
- prior / Bayes rule
- problems, supervised learning
- regression problem / Machine learning
- classification / Machine learning
- progress curve / Learning curve
- pure set / Tree splitting
- pyfpgrowth / Validation
- Python
- TF-IDF, executing / Executing TF-IDF in Python
R
- 0.632 rule
- in bootstrapping / 0.632 rule in bootstrapping
- random forest algorithm
- about / Random forest algorithm
- case study / Case study
- random walk / Random walk
- recall / Confusion matrix
- receiver operating characteristic (ROC) curve / Receiver operating characteristic curve
- receiver operating characteristic (ROC) metric / Case study
- Rectified Linear Units (ReLU) / Types of activation functions
- recurrent neural networks (RNNs)
- about / Recurrent neural networks
- limitations / Limitations of RNNs
- use case / Use case
- regularization / Regularization
- reinforcement learning / Machine learning
- relative humidity (RH) / Wright's model
- residual / Residual
- ridge regression (L2) / Ridge regression (L2)
- Root Node / Decision tree
S
- Sampling with Replacement / Bootstrapping
- second differencing / Autoregressive integrated moving average
- self-organizing maps (SOM)
- about / Self-organizing maps
- reference / Self-organizing maps
- working / SOM
- sensitivity / Confusion matrix
- sentences / Sentences
- sentence tokenization / Sentences
- sentiment analysis / Sentiment analysis
- sentiment classification
- about / Sentiment classification
- TF-IDF feature extraction / TF-IDF feature extraction
- count vectorizer bag of words feature extraction / Count vectorizer bag of words feature extraction
- model building count vectorization / Model building count vectorization
- shallow network / Why do we need a deep learning model?
- sigmoid / Types of activation functions
- single layer neural network / Why do we need a deep learning model?
- snickometer / Anomaly detection
- Specificity / Confusion matrix
- stages, bagging (Bootstrap aggregation)
- startup function / Learning curve
- stationarity / Stationarity
- stationarity detection / Detection of stationarity
- statistical model / Statistical models
- statistical modeling-two cultures / Statistical modeling – the two cultures of Leo Breiman
- steps, bag of words model
- corpus, building / Bags of words
- vocabulary, building / Bags of words
- document vector creation / Bags of words
- text, cleansing / Bags of words
- count vector / Bags of words
- sub-tree / Decision tree
- subconscious intelligence / Learning curve
- supervised learning / Machine learning
- support vector / Support vector
- Support vector machine (SVM) / SVM
- SVM example optimization
- through grid search / SVM example and parameter optimization through grid search
- symmetry / Network initialization
T
- temperature (T) / Wright's model
- term frequency (TF) / TF-IDF
- term frequency inverse-document frequency (TF-IDF)
- about / Bags of words, TF-IDF
- working / TF-IDF
- count vectorizer, executing / Executing the count vectorizer
- executing, in Python / Executing TF-IDF in Python
- test set
- text classification
- Naive Bayes technique / The Naive Bayes technique in text classification
- text corpus
- about / Text corpus
- sentences / Sentences
- words / Words
- time series analysis / Introduction to time series analysis
- tools, white noise detection
- line plot / Detection of white noise in a series
- autocorrelation plot / Detection of white noise in a series
- summary / Detection of white noise in a series
- topic modeling
- about / Topic modeling
- with latent dirichlet allocation (LDA) / Topic modeling
- with latent semantic analysis (LSA) / Topic modeling
- with non-negative matrix factorization / Topic modeling
- Naive Bayes technique, in text classification / The Naive Bayes technique in text classification
- training set
- transaction ID / Association rules
- transforming autoencoder / Hinton's Capsule network
- tree splitting
- about / Tree splitting
- parameters / Parameters of tree splitting
- True Negative (TN) / Confusion matrix
- True Positive (TP) / Confusion matrix
- Type 1 error / Confusion matrix
- Type 2 error / Confusion matrix
- types, activation function
- sigmoid / Types of activation functions
- hyperbolic tangent function (tanh) / Types of activation functions
- Rectified Linear Units (ReLU) / Types of activation functions
- types, kernel
- linear / Linear kernel
- polynomial / Polynomial kernel
- Gaussian / Gaussian kernel
- types, regularization
- ridge regression (L2) / Ridge regression (L2)
- least absolute shrinkage and selection operator (LASSO) / Least absolute shrinkage and selection operator
U
- underfitting / Bias-variance trade off
- universal function approximators / Activation function
- unknown signal X / Our goal
- unsupervised learning / Machine learning
V
- vacuum (V) / Wright's model
- validation, frequent pattern growth (FP-growth)
- library, importing / Importing the library
- validation set / Training data development data – test data
- vanishing gradient
- about / Vanishing gradient
- overcoming / Overcoming vanishing gradient
- vanishing gradient problem / Vanishing gradient
- variance error / Bias-variance trade off
- vectors
- about / Introduction to vectors
- magnitude / Magnitude of the vector
- dot product / Dot product
W
- ways, dimensionality reduction
- feature elimination / Dimensionality reduction
- feature extraction / Dimensionality reduction
- white noise
- about / White noise
- detecting, in series / Detection of white noise in a series
- words / Words
- words, text corpus
- bag of words model / Bags of words
- word tokenization / Words
- Wright's model / Wright's model