Index
A
- accuracy (ACC) / Optimizing the precision and recall of a classification model
- activation functions, for feedforward neural networks
- selecting / Choosing activation functions for feedforward neural networks
- logistic function recap / Logistic function recap
- probabilities, estimating in multi-class classification via softmax function / Estimating probabilities in multi-class classification via the softmax function
- output spectrum, broadening with hyperbolic tangent / Broadening the output spectrum by using a hyperbolic tangent
- AdaBoost / Leveraging weak learners via adaptive boosting
- adaline
- about / First steps with scikit-learn
- adaptive boosting
- weak learners, leveraging via / Leveraging weak learners via adaptive boosting
- ADAptive LInear NEuron (ADALINE) / Adaptive linear neurons and the convergence of learning
- ADAptive LInear NEuron (Adaline) / Solving regression for regression parameters with gradient descent
- ADAptive LInear NEuron (ADALINE) algorithm
- adaptive linear neurons
- about / Adaptive linear neurons and the convergence of learning
- cost functions, minimizing with gradient descent / Minimizing cost functions with gradient descent
- implementing, in Python / Implementing an Adaptive Linear Neuron in Python
- large scale machine-learning / Large scale machine learning and stochastic gradient descent
- stochastic gradient descent / Large scale machine learning and stochastic gradient descent
- agglomerative and divisive hierarchical clustering
- agglomerative clustering
- applying, via scikit-learn / Applying agglomerative clustering via scikit-learn
- algorithms
- debugging, with learning and validation curves / Debugging algorithms with learning and validation curves
- algorithm selection
- with nested cross-validation / Algorithm selection with nested cross-validation
- area under the curve (AUC) / Plotting a receiver operating characteristic
- artificial neural network
- training / Training an artificial neural network
- logistic cost function, computing / Computing the logistic cost function
- neural networks, training via backpropagation / Training neural networks via backpropagation
- artificial neurons
- arXiv
- average linkage
B
- backpropagation / Training neural networks via backpropagation
- intuition, developing / Developing your intuition for backpropagation
- backpropagation algorithm
- bag-of-words model
- defining / Introducing the bag-of-words model
- vocabulary, creating / Introducing the bag-of-words model
- words, transforming into feature vectors / Transforming words into feature vectors
- word relevancy, assessing via term frequency-inverse document frequency / Assessing word relevancy via term frequency-inverse document frequency
- text data, cleaning / Cleaning text data
- documents, processing into tokens / Processing documents into tokens
- bagging
- basic terminology
- boosting / Leveraging weak learners via adaptive boosting
- bootstrap aggregating
- border point
- Breast Cancer Wisconsin dataset
C
- Cascading Style Sheets (CSS)
- about / Form validation and rendering
- categorical data
- handling / Handling categorical data
- ordinal features, mapping / Mapping ordinal features
- class labels, encoding / Encoding class labels
- one-hot encoding, performing on nominal features / Performing one-hot encoding on nominal features
- classification algorithm
- selecting / Choosing a classification algorithm
- classification error / Maximizing information gain – getting the most bang for the buck
- class probabilities, modeling via logistic regression
- about / Modeling class probabilities via logistic regression
- logistic regression intuition and conditional probabilities / Logistic regression intuition and conditional probabilities
- weights, of logistic cost function / Learning the weights of the logistic cost function
- logistic regression model, training with scikit-learn / Training a logistic regression model with scikit-learn
- overfitting, tackling via regularization / Tackling overfitting via regularization
- cluster inertia
- clusters
- organizing, as hierarchical tree / Organizing clusters as a hierarchical tree
- complete linkage
- complex functions, modeling with artificial neural networks
- about / Modeling complex functions with artificial neural networks
- single-layer neural network recap / Single-layer neural network recap
- multi-layer neural network architecture / Introducing the multi-layer neural network architecture
- neural network, activating via forward propagation / Activating a neural network via forward propagation
- Computing Research Repository (CoRR)
- confusion matrix
- reading / Reading a confusion matrix
- convergence, in neural networks
- about / Convergence in neural networks
- convolution / Convolutional Neural Networks
- convolutional layer / Convolutional Neural Networks
- Convolutional Neural Networks (CNNs or ConvNets)
- about / Convolutional Neural Networks
- core point
- CSV (comma-separated values)
- about / Dealing with missing data
- curse of dimensionality
D
- dataset
- partitioning, in training and test sets / Partitioning a dataset in training and test sets
- data storage
- SQLite database, setting up for / Setting up a SQLite database for data storage
- DBSCAN
- high density regions, locating via / Locating regions of high density via DBSCAN
- disadvantages / Locating regions of high density via DBSCAN
- decision regions / Training a perceptron via scikit-learn
- decision tree learning
- about / Decision tree learning
- information gain, maximizing / Maximizing information gain – getting the most bang for the buck
- decision tree, building / Building a decision tree
- weak to strong learners, combining via random forests / Combining weak to strong learners via random forests
- decision tree regression
- about / Decision tree regression
- decision trees
- decision trees classifiers
- about / Decision tree learning
- dendrograms
- about / Organizing clusters as a hierarchical tree
- attaching, to heat map / Attaching dendrograms to a heat map
- depth parameter / Fine-tuning machine learning models via grid search
- dimensionality reduction
- distance matrix
- hierarchical clustering, performing on / Performing hierarchical clustering on a distance matrix
- document classification
- logistic regression model, training for / Training a logistic regression model for document classification
- dummy feature / Performing one-hot encoding on nominal features
E
- Elastic Net / Using regularized methods for regression
- elbow method
- about / Grouping objects by similarity using k-means, Using the elbow method to find the optimal number of clusters
- used, for finding optimal number of clusters / Using the elbow method to find the optimal number of clusters
- emoticon characters
- about / Cleaning text data
- ensemble classifier
- ensemble methods / Learning with ensembles
- ensemble of classifiers
- building, from bootstrap samples / Bagging – building an ensemble of classifiers from bootstrap samples
- ensembles
- learning with / Learning with ensembles
- entropy / Maximizing information gain – getting the most bang for the buck
- epoch
- error (ERR) / Optimizing the precision and recall of a classification model
- Exploratory Data Analysis (EDA)
F
- false positive rate (FPR) / Optimizing the precision and recall of a classification model
- feature detectors
- feature extraction
- feature importance
- assessing, with random forests / Assessing feature importance with random forests
- feature map / Convolutional Neural Networks
- feature scaling
- about / Bringing features onto the same scale
- illustrating / Bringing features onto the same scale
- feature selection
- about / Selecting meaningful features, Sequential feature selection algorithms
- sparse solutions, with L1 regularization / Sparse solutions with L1 regularization
- fitted scikit-learn estimators
- serializing / Serializing fitted scikit-learn estimators
- Flask
- web application, developing with / Developing a web application with Flask
- Flask documentation
- Flask web application
- defining / Our first Flask web application
- form validation / Form validation and rendering
- rendering / Form validation and rendering
- flower dataset
- forward propagation
- neural network, activating via / Activating a neural network via forward propagation
- fuzzifier
- about / Hard versus soft clustering
- fuzziness
- about / Hard versus soft clustering
- fuzziness coefficient
- about / Hard versus soft clustering
- fuzzy C-means (FCM) algorithm
- about / Hard versus soft clustering
- fuzzy clustering
- about / Hard versus soft clustering
- fuzzy k-means
- about / Hard versus soft clustering
G
- 1-gram
- Gaussian kernel
- Gini index / Maximizing information gain – getting the most bang for the buck
- Global Interpreter Lock (GIL) / Building, compiling, and running expressions with Theano
- Google Developers portal
- URL / Cleaning text data
- gradient checking
- neural networks, debugging with / Debugging neural networks with gradient checking
- about / Debugging neural networks with gradient checking
- Gradient Descent (GD) / Solving regression for regression parameters with gradient descent
- gradient descent example / Training a perceptron via scikit-learn
- gradient descent optimization algorithm
- Graphical Processing Units (GPUs) / A few last words about neural network implementation
- GraphViz
- URL / Building a decision tree
- grid search
- machine learning models, fine-tuning via / Fine-tuning machine learning models via grid search
- about / Fine-tuning machine learning models via grid search
- hyperparameters, tuning via / Tuning hyperparameters via grid search
H
- handwritten digits
- classifying / Classifying handwritten digits
- hard clustering
- about / Hard versus soft clustering
- versus soft clustering / Hard versus soft clustering
- heat map
- dendrograms, attaching to / Attaching dendrograms to a heat map
- about / Attaching dendrograms to a heat map
- hidden layer
- hierarchical and density-based clustering
- hierarchical clustering
- about / Organizing clusters as a hierarchical tree
- performing, on distance matrix / Performing hierarchical clustering on a distance matrix
- high density regions
- locating, via DBSCAN / Locating regions of high density via DBSCAN
- holdout cross-validation / Using k-fold cross-validation to assess model performance
- holdout method
- about / The holdout method
- disadvantage / The holdout method
- Housing Dataset
- exploring / Exploring the Housing Dataset
- about / Exploring the Housing Dataset
- URL / Exploring the Housing Dataset
- features / Exploring the Housing Dataset
- characteristics / Visualizing the important characteristics of a dataset
- HTML basics
- hyperbolic tangent
- hyperbolic tangent (sigmoid) kernel
- hyperbolic tangent (tanh)
- hyperparameters / The holdout method
- tuning, via grid search / Tuning hyperparameters via grid search
- about / Introducing the multi-layer neural network architecture
I
- IMDb movie review dataset
- obtaining / Obtaining the IMDb movie review dataset
- in-built pickle module
- information gain (IG) / Decision tree learning
- Information Gain (IG)
- about / Decision tree regression
- instance-based learning
- intelligent machines
- building, to transform data into knowledge / Building intelligent machines to transform data into knowledge
- Internet Movie Database (IMDb)
- inverse document frequency
- IPython notebooks
- Iris-Setosa / Training a perceptron via scikit-learn
- Iris-Versicolor / Training a perceptron via scikit-learn
- Iris-Virginica / Training a perceptron via scikit-learn
- Iris dataset
J
- Jinja2 syntax
- joblib
K
- k-fold cross-validation
- used, for assessing model performance / Using k-fold cross-validation to assess model performance
- about / Using k-fold cross-validation to assess model performance, K-fold cross-validation
- holdout method / The holdout method
- k-means
- used, for grouping objects by similarity / Grouping objects by similarity using k-means
- about / Grouping objects by similarity using k-means
- K-means++
- about / K-means++
- k-nearest neighbor classifier (KNN)
- K-nearest neighbors
- k-nearest neighbors (KNN) algorithm
- Keras
- URL / A few last words about neural network implementation, Training neural networks efficiently using Keras
- about / Training neural networks efficiently using Keras
- used, for training neural networks / Training neural networks efficiently using Keras
- kernel
- polynomial kernel / Kernel functions and the kernel trick
- hyperbolic tangent (sigmoid) kernel / Kernel functions and the kernel trick
- Radial Basis Function (RBF) / Kernel functions and the kernel trick
- kernel functions
- kernel principal component analysis
- using, for nonlinear mappings / Using kernel principal component analysis for nonlinear mappings
- implementing, in Python / Implementing a kernel principal component analysis in Python
- kernel principal component analysis, examples
- half-moon shapes, separating / Example 1 – separating half-moon shapes
- concentric circles, separating / Example 2 – separating concentric circles
- new data points, projecting / Projecting new data points
- kernel principal component analysis, scikit-learn
- kernel SVM
- kernel trick
- KNN algorithm
L
- L1 regularization
- sparse solutions / Sparse solutions with L1 regularization
- L2 regularization / Sparse solutions with L1 regularization
- L2-regularization
- Lancaster stemmer
- about / Processing documents into tokens
- Lasagne
- Latent Dirichlet allocation
- lazy learner
- LDA, via scikit-learn
- about / LDA via scikit-learn
- learning curves
- about / Debugging algorithms with learning and validation curves
- bias and variance problems, diagnosing with / Diagnosing bias and variance problems with learning curves
- learning rate
- Least Absolute Shrinkage and Selection Operator (LASSO) / Using regularized methods for regression
- leave-one-out (LOO) cross-validation method / K-fold cross-validation
- lemmas
- about / Processing documents into tokens
- lemmatization
- about / Processing documents into tokens
- LIBLINEAR
- LIBLINEAR library / Estimating the coefficient of a regression model via scikit-learn
- LIBSVM
- linear regression model
- performance, evaluating / Evaluating the performance of linear regression models
- turning, into curve / Turning a linear regression model into a curve – polynomial regression
- linkage matrix
- LISA lab
- logistic function
- logistic regression / Activating a neural network via forward propagation
- logistic regression model
- training, for document classification / Training a logistic regression model for document classification
- logit function
- Long Short Term Memory (LSTM) / Recurrent Neural Networks
M
- machine-learning
- supervised learning / The three different types of machine learning
- unsupervised learning / The three different types of machine learning
- reinforcement learning / The three different types of machine learning
- Python, using for / Using Python for machine learning
- history / Artificial neurons – a brief glimpse into the early history of machine learning
- machine learning models
- fine-tuning, via grid search / Fine-tuning machine learning models via grid search
- macro averaging method / The scoring metrics for multiclass classification
- majority vote / Combining weak to strong learners via random forests
- majority voting principle / Learning with ensembles
- margin
- margin classification
- maximum margin intuition / Maximum margin intuition
- nonlinearly separable case, dealing with / Dealing with the nonlinearly separable case using slack variables
- alternative implementations, in scikit-learn / Alternative implementations in scikit-learn
- Matplotlib
- McCulloch-Pitt neuron model
- mean imputation / Imputing missing values
- Mean Squared Error (MSE) / Evaluating the performance of linear regression models
- Median Absolute Deviation (MAD) / Fitting a robust regression model using RANSAC
- metric parameter
- reference / K-nearest neighbors – a lazy learning algorithm
- micro averaging method / The scoring metrics for multiclass classification
- missing data, dealing with
- about / Dealing with missing data
- samples, eliminating / Eliminating samples or features with missing values
- features, eliminating / Eliminating samples or features with missing values
- missing values, inputing / Imputing missing values
- scikit-learn estimator API / Understanding the scikit-learn estimator API
- MNIST dataset
- about / Classifying handwritten digits, Obtaining the MNIST dataset
- obtaining / Obtaining the MNIST dataset
- URL / Obtaining the MNIST dataset, Training neural networks efficiently using Keras
- set images, training / Obtaining the MNIST dataset
- set labels, training / Obtaining the MNIST dataset
- set images, testing / Obtaining the MNIST dataset
- set labels, testing / Obtaining the MNIST dataset
- multi-layer perceptron, implementing / Implementing a multi-layer perceptron
- model performance
- assessing, k-fold cross-validation used / Using k-fold cross-validation to assess model performance
- model persistence
- model selection / The holdout method
- movie classifier
- turning, into web application / Turning the movie classifier into a web application
- movie review classifier
- updating / Updating the movie review classifier
- movie review dataset
- multi-layer feedforward neural network
- multi-layer perceptron (MLP)
- multiple linear regression
- MurmurHash3 function
N
- n-gram
- nested cross-validation
- used, for algorithm selection / Algorithm selection with nested cross-validation
- neural network architectures
- about / Other neural network architectures
- Convolutional Neural Networks (CNNs or ConvNets) / Convolutional Neural Networks
- Recurrent Neural Networks (RNNs) / Recurrent Neural Networks
- neural network implementation
- neural networks
- developing, with gradient checking / Debugging neural networks with gradient checking
- convergence / Convergence in neural networks
- training, Keras used / Training neural networks efficiently using Keras
- NLTK
- NLTK package
- noise points
- nominal features
- about / Handling categorical data
- non-empty classes / Maximizing information gain – getting the most bang for the buck
- nonlinear mappings
- kernel principal component analysis, using for / Using kernel principal component analysis for nonlinear mappings
- nonlinear problems, solving with kernel SVM
- about / Solving nonlinear problems using a kernel SVM
- kernel trick, using for finding separating hyperplanes / Using the kernel trick to find separating hyperplanes in higher dimensional space
- nonlinear relationships
- modeling, in Housing Dataset / Modeling nonlinear relationships in the Housing Dataset
- dealing with, random forests used / Dealing with nonlinear relationships using random forests
- nonparametric models
- normal equation / Estimating the coefficient of a regression model via scikit-learn
- normalization
- notations
- NumPy
O
- objects by similarity
- grouping, k-means used / Grouping objects by similarity using k-means
- odds ratio
- offsets
- one-hot encoding / Performing one-hot encoding on nominal features
- one-hot representation
- One-vs-All (OvA) technique
- One-vs-Rest (OvR) approach / Sparse solutions with L1 regularization
- One-vs.-All (OvA) / Training a perceptron model on the Iris dataset
- One-vs.-Rest (OvR) / Training a perceptron model on the Iris dataset
- One-vs.-Rest (OvR) method / Training a perceptron via scikit-learn
- One vs. All (OvA) classification / The scoring metrics for multiclass classification
- online algorithms
- opinion mining
- ordinal features
- about / Handling categorical data
- Ordinary Least Squares (OLS) method / Implementing an ordinary least squares linear regression model
- Ordinary Least Squares (OLS) regression / Wrapping things up – a linear regression example
- ordinary least squares linear regression model
- about / Implementing an ordinary least squares linear regression model
- implementing / Implementing an ordinary least squares linear regression model
- regression, solving for regression parameters with gradient descent / Solving regression for regression parameters with gradient descent
- coefficient, estimating via scikit-learn / Estimating the coefficient of a regression model via scikit-learn
- out-of-core learning
- overfitting / Training a perceptron via scikit-learn
P
- Pandas
- parametric models
- Pearson product-moment correlation coefficients / Visualizing the important characteristics of a dataset
- perceptron
- about / First steps with scikit-learn
- perceptron learning algorithm
- implementing, in Python / Implementing a perceptron learning algorithm in Python
- perceptron model
- training, on Iris dataset / Training a perceptron model on the Iris dataset
- performance evaluation metrics
- about / Looking at different performance evaluation metrics
- confusion matrix, reading / Reading a confusion matrix
- precision and recall of classification model, optimizing / Optimizing the precision and recall of a classification model
- receiver operator characteristic (ROC) graphs, plotting / Plotting a receiver operating characteristic
- metrics, scoring for multiclass classification / The scoring metrics for multiclass classification
- petal length
- petal width
- pipeline
- pipelines
- workflows, streamlining with / Streamlining workflows with pipelines
- transformers and estimators, combining in / Combining transformers and estimators in a pipeline
- plurality voting / Learning with ensembles
- polynomial kernel
- Polynomial regression
- pooling layer / Convolutional Neural Networks
- Porter stemmer algorithm
- about / Processing documents into tokens
- precision (PRE) / Optimizing the precision and recall of a classification model
- precision-recall curves / Plotting a receiver operating characteristic
- Principal Component Analysis (PCA) / Visualizing the important characteristics of a dataset
- principal component analysis, scikit-learn
- prototype-based clustering
- public server
- web application, deploying to / Deploying the web application to a public server
- Pylearn2
- PyPrind
- Python
- about / Using Python for machine learning
- using, for machine-learning / Using Python for machine learning
- packages, installing / Installing Python packages
- references / Installing Python packages
- kernel principal component analysis, implementing in / Implementing a kernel principal component analysis in Python
- PythonAnywhere account
Q
- quality of clustering
- quantifying, via silhouette plots / Quantifying the quality of clustering via silhouette plots
R
- Radial Basis Function (RBF)
- about / Kernel functions and the kernel trick
- implementing / Kernel functions and the kernel trick
- Radius Basis Function kernel (RBF kernel) / Using the kernel trick to find separating hyperplanes in higher dimensional space
- random forest regression
- random forests / Combining weak to strong learners via random forests
- RANdom SAmple Consensus (RANSAC) algorithm / Fitting a robust regression model using RANSAC
- raw term frequencies
- recall (REC) / Optimizing the precision and recall of a classification model
- receptive fields / Convolutional Neural Networks
- Recurrent Neural Networks (RNNs) / Recurrent Neural Networks
- regression line
- regular expression (regex)
- about / Cleaning text data
- regularization / Computing the logistic cost function
- regularization parameter / Fine-tuning machine learning models via grid search
- regularized methods
- using, for regression / Using regularized methods for regression
- reinforcement learning
- about / Solving interactive problems with reinforcement learning
- interactive problems, solving with / Solving interactive problems with reinforcement learning
- residual plots / Evaluating the performance of linear regression models
- residuals
- Ridge Regression / Using regularized methods for regression
- roadmap, for machine-learning systems
- about / A roadmap for building machine learning systems
- preprocessing / Preprocessing – getting data into shape
- predictive model, training / Training and selecting a predictive model
- predictive model, selecting / Training and selecting a predictive model
- models, evaluating / Evaluating models and predicting unseen data instances
- unseen data instances, predicting / Evaluating models and predicting unseen data instances
- robust regression model
- fitting, RANSAC used / Fitting a robust regression model using RANSAC
- ROC area under the curve (ROC AUC)
S
- S-shaped (sigmoidal) curve
- scatterplot matrix / Visualizing the important characteristics of a dataset
- scenarios, distance values
- incorrect approach / Performing hierarchical clustering on a distance matrix
- correct approach / Performing hierarchical clustering on a distance matrix
- scikit-learn
- about / First steps with scikit-learn
- perceptron, training via / Training a perceptron via scikit-learn
- reference link / Kernel principal component analysis in scikit-learn
- agglomerative clustering, applying via / Applying agglomerative clustering via scikit-learn
- scikit-learn estimator API
- scikit-learn online documentation
- sepal width
- sepal width feature / Decision tree learning
- Sequential Backward Selection (SBS)
- sequential feature selection algorithms
- sigmoid (logistic) activation function / Activating a neural network via forward propagation
- sigmoid function
- silhouette analysis
- silhouette coefficient
- silhouette plots
- about / Grouping objects by similarity using k-means
- quality of clustering, quantifying via / Quantifying the quality of clustering via silhouette plots
- simple linear regression
- simple linear regression model
- simple majority vote classifier
- implementing / Implementing a simple majority vote classifier
- different algorithms, combining with majority vote / Combining different algorithms for classification with majority vote
- single linkage
- Snowball stemmer
- about / Processing documents into tokens
- soft clustering
- versus hard clustering / Hard versus soft clustering
- about / Hard versus soft clustering
- soft k-means
- about / Hard versus soft clustering
- softmax function
- sparse
- spectral clustering algorithms
- SQLite
- sqlite3
- SQLite database
- setting up, for data storage / Setting up a SQLite database for data storage
- SQLite Manager
- squared Euclidean distance
- stacking / Evaluating and tuning the ensemble classifier
- standardization
- stochastic gradient descent
- Stochastic Gradient Descent (SGD) / Solving regression for regression parameters with gradient descent
- stop-word removal
- about / Processing documents into tokens
- strong learner / Combining weak to strong learners via random forests
- sub-sampling / Convolutional Neural Networks
- Sum of Squared Errors (SSE) / Solving regression for regression parameters with gradient descent, Wrapping things up – a linear regression example
- sum of squared errors (SSE)
- sum of the squared errors (SSE) / Sparse solutions with L1 regularization
- supervised data compression, via linear discriminant analysis
- about / Supervised data compression via linear discriminant analysis
- scatter matrices, computing / Computing the scatter matrices
- linear discriminants, selecting for new feature subspace / Selecting linear discriminants for the new feature subspace
- samples, projecting onto new feature space / Projecting samples onto the new feature space
- supervised learning
- about / Making predictions about the future with supervised learning
- predictions, making with / Making predictions about the future with supervised learning
- classification, for predicting class labels / Classification for predicting class labels
- regression, for predicting continuous outcomes / Regression for predicting continuous outcomes
- Support Vector Machine (SVM)
- support vector machine (SVM) / Tuning hyperparameters via grid search
- support vectors
- SymPy
- URL / What is Theano?
- about / What is Theano?
T
- term frequency
- term frequency-inverse document frequency (tf-idf)
- Theano
- about / What is Theano?
- reference / What is Theano?
- working with / First steps with Theano
- configuring / Configuring Theano
- array structures, working with / Working with array structures
- linear regression example / Wrapping things up – a linear regression example
- threshold function
- transformer classes / Understanding the scikit-learn estimator API
- transformers and estimators
- combining, in pipeline / Combining transformers and estimators in a pipeline
- true positive rate (TPR) / Optimizing the precision and recall of a classification model
U
- underfitting
- unigram model
- unsupervised dimensionality reduction, via principal component analysis
- about / Unsupervised dimensionality reduction via principal component analysis
- total variance / Total and explained variance
- explained variance / Total and explained variance
- feature transformation / Feature transformation
- unsupervised learning
- about / Discovering hidden structures with unsupervised learning
- hidden structures, discovering with / Discovering hidden structures with unsupervised learning
- subgroups, finding with clustering / Finding subgroups with clustering
- dimensionality reduction, for data compression / Dimensionality reduction for data compression
V
- validation curves
- about / Debugging algorithms with learning and validation curves
- overfitting and underfitting, addressing with / Addressing overfitting and underfitting with validation curves
- validation dataset
- vectorization
W
- Ward's linkage
- weak learners / Combining weak to strong learners via random forests
- leveraging, via adaptive boosting / Leveraging weak learners via adaptive boosting
- about / Leveraging weak learners via adaptive boosting
- web application
- developing, with Flask / Developing a web application with Flask
- movie classifier, turning into / Turning the movie classifier into a web application
- implementation, URL / Turning the movie classifier into a web application
- deploying, to public server / Deploying the web application to a public server
- movie review classifier, updating / Updating the movie review classifier
- Wine dataset
- about / Partitioning a dataset in training and test sets, Bagging – building an ensemble of classifiers from bootstrap samples
- URL / Partitioning a dataset in training and test sets
- features / Partitioning a dataset in training and test sets
- Hue class / Bagging – building an ensemble of classifiers from bootstrap samples
- Alcohol class / Bagging – building an ensemble of classifiers from bootstrap samples
- within cluster sum-squared-error
- about / Hard versus soft clustering
- word2vec
- word stemming
- about / Processing documents into tokens
- workflows
- streamlining, with pipelines / Streamlining workflows with pipelines
- WTForms library
X
- 5x2 cross-validation / Algorithm selection with nested cross-validation
Z
- 7-Zip