Index
A
- accumulators write-only variables
- sharing, across cluster nodes / Accumulators write-only variables
- AdaBoost
- about / CART and boosting
- Adam
- adaptive gradient (ADAGRAD)
- about / The neural network architecture
- additive expansion
- about / Gradient Boosting Machines
- AlexNet example
- Anaconda
- about / Scientific distributions
- URL / Scientific distributions
- URL, for packages / Scientific distributions
- architecture, neural network
- input layer / The input layer
- hidden layer / The hidden layer
- output layer / The output layer
- area under the curve (AUC) / Describing the target
- autoencoders
- unsupervised learning / Autoencoders and unsupervised learning
- about / Autoencoders
- deep learning, with stacked denoising autoencoders / Autoencoders
- Average Stochastic Descent (ASGD) / Achieving SVM at scale with SGD
B
- backpropagation
- about / The neural network architecture
- common problems / The neural network architecture
- with mini batch / The neural network architecture
- batch normalization function
- URL / GPU Computing
- bike sharing dataset
- about / The bike-sharing dataset
- URL / The bike-sharing dataset
- BLAS
- URL / GPU computing
- Boltzmann machines
- boosting
- about / CART and boosting
- bootstrap aggregation (bagging)
- about / Bootstrap aggregation
- Boston datasets
- broadcast and accumulators variables
- sharing, across cluster nodes / Broadcast and accumulators together – an example
- broadcast read-only variables
- sharing, across cluster nodes / Broadcast read-only variables
C
- CART
- about / CART and boosting
- with H2O / Out-of-core CART with H2O
- cells / Introducing Jupyter/IPython
- Classification and Regression Trees (CART)
- about / GPU computing
- click-through rate (CTR)
- about / Making large scale examples
- climate
- clustering
- K-means / Clustering – K-means
- cluster nodes
- variables, sharing across / Sharing variables across cluster nodes
- broadcast read-only variables, sharing across / Broadcast read-only variables
- accumulators write-only variables, sharing across / Accumulators write-only variables
- broadcast and accumulators variables, sharing across / Broadcast and accumulators together – an example
- completeness
- about / Selection of the best K
- conda
- about / Scientific distributions
- ConvNets
- Convolutional Neural Networks (CNN)
- convolutional neural networks (CNN)
- in TensorFlow, through Keras / Convolutional Neural Networks in TensorFlow through Keras
- convolution layer / The convolution layer
- pooling layer / The pooling layer
- fully connected layer / The fully connected layer
- applying / CNN's with an incremental approach
- computing, with GPU / GPU Computing
- convolution layer
- about / The convolution layer
- covertype dataset
- about / The covertype dataset
- URL / The covertype dataset
- CUDA
- about / Scale up with Python, Theano
- reference link / Theano
- CUDA Toolkit
- URL / Theano
- Cygwin openssh
- URL / Using the VM
D
- data
- streaming, from resources / Streaming data from sources
- data, streaming from resources
- about / Streaming data from sources
- datasets, experimenting with / Datasets to try the real thing yourself
- bike-sharing dataset, streaming / The first example – streaming the bike-sharing dataset
- pandas I/O tools, using / Using pandas I/O tools
- databases, working with / Working with databases
- ordering of instances, warning / Paying attention to the ordering of instances
- data preprocessing, in Spark
- about / Data preprocessing in Spark
- JSON files, importing / JSON files and Spark DataFrames
- Spark DataFrames / JSON files and Spark DataFrames
- dealing, with missing data / Dealing with missing data
- tables in-memory, creating / Grouping and creating tables in-memory
- tables in-memory, grouping / Grouping and creating tables in-memory
- preprocessed DataFrame, writing to disk / Writing the preprocessed DataFrame or RDD to disk
- RDD, writing to disk / Writing the preprocessed DataFrame or RDD to disk
- datasets
- reference link / Datasets to try the real thing yourself
- Buzz in social media dataset, reference link / Datasets to try the real thing yourself
- Census-Income (KDD) dataset, reference link / Datasets to try the real thing yourself
- KDD Cup 1999 dataset, reference link / Datasets to try the real thing yourself
- Bike-sharing dataset, reference link / Datasets to try the real thing yourself
- BlogFeedback dataset, reference link / Datasets to try the real thing yourself
- Covertype dataset, reference link / Datasets to try the real thing yourself
- using / Datasets to experiment with on your own
- bike sharing dataset / The bike-sharing dataset
- covertype dataset / The covertype dataset
- data streams
- used, for feature management / Feature management with data streams
- decision boundaries
- Deep Belief Networks (DBN)
- deep learning
- scaling, with H2O / Deep learning at scale with H2O
- unsupervised pretraining / Deep learning and unsupervised pretraining
- denoising autoencoders
- direct acyclic graph (DAG) / Working with Spark DataFrames
- Directed Acyclic Graph (DAG)
- about / pySpark
- distributed filesystem (dfs)
- about / HDFS
- distributed framework
- need for / Why do we need a distributed framework?
E
- Elastic Compute Cloud (EC2)
- about / Out-of-core learning
- Elbow method
- about / Selection of the best K
- error correcting tournament (ECT)
- expansion
- about / The hidden layer
- expectation (E)
- about / Clustering – K-means
- expectation-maximization (EM) algorithm
- about / Clustering – K-means
- explicit high-dimensional mappings
- Exploratory Data Analysis (EDA)
- about / Unsupervised methods
- extreme gradient boosting (XGBoost)
- about / CART and boosting, XGBoost
- reference link / XGBoost
- regression / XGBoost regression
- variable importance, plotting / XGBoost and variable importance
- large datasets, streaming / XGBoost streaming large datasets
- model persistence / XGBoost model persistence
- extreme gradient boosting (XGBoost), parameters
- extremely randomized forest
- extremely randomized trees
- for randomized search / Extremely randomized trees and large datasets
F
- fast parameter
- optimizing, with randomized search / Fast parameter optimization with randomized search
- feature decomposition
- principal component analysis (PCA) / Feature decomposition – PCA
- feature management, with data streams
- about / Feature management with data streams
- target, describing / Describing the target
- hashing trick / The hashing trick
- basis transformations / Other basic transformations
- testing / Testing and validation in a stream
- validation / Testing and validation in a stream
- SGD, using / Trying SGD in action
- feature selection
- by regularization / Feature selection by regularization
- feedforward neural network
- about / The neural network architecture
- forward propagation
- about / The neural network architecture
- fully connected layer
- about / The fully connected layer
G
- Gensim
- about / Gensim, Scaling LDA – memory, CPUs, and machines
- URL / Gensim
- Gensim package
- URL / LDA
- get-pip.py script
- URL / The installation of packages
- URL, for setup tool / The installation of packages
- Git for Windows
- Gource
- GPU
- neural network, with theanets / Deep learning with theanets
- computing / GPU computing, GPU computing
- using, for convolutional neural networks (CNN) / GPU Computing
- reference link, for computing / GPU computing
- parallel computing, with Theano / Theano – parallel computing on the GPU
- gradient boosting machine (GBM)
- gradient boosting machine (GBM)
- max_depth / max_depth
- learning_rate / learning_rate
- subsample / Subsample
- warm_start / Faster GBM with warm_start
- speeding up, with warm_start / Speeding up GBM with warm_start
- GBM models, training / Training and storing GBM models
- GBM models, storing / Training and storing GBM models
- graphical user interface (GUI)
- about / VirtualBox
- gridsearch
- guests
- about / VirtualBox
H
- H2O
- about / Scale up with Python, H2O
- URL / H2O
- deep learning, scaling / Deep learning at scale with H2O
- large scale deep learning / Large scale deep learning with H2O
- gridsearch / Gridsearch on H2O, Random forest and gridsearch on H2O, Stochastic gradient boosting and gridsearch on H2O
- CART / Out-of-core CART with H2O
- random forest / Random forest and gridsearch on H2O
- stochastic gradient boosting / Stochastic gradient boosting and gridsearch on H2O
- principal component analysis (PCA) / PCA with H2O
- K-means / K-means with H2O
- Hadoop
- ecosystem / The Hadoop ecosystem
- architecture / Architecture
- Distributed File System (HDFS) / HDFS
- MapReduce / MapReduce
- Yet Another Resource Negotiator (YARN) / YARN
- Hadoop Distributed File System (HDFS)
- about / Explaining scalability in detail, HDFS
- hard-margin classifiers
- about / Support Vector Machines
- Hierarchical Dirichler Processing (HDP)
- hinge loss
- about / Hinge loss and its variants
- variants / Hinge loss and its variants
- homogeneity
- about / Selection of the best K
- host
- about / VirtualBox
- hyperparameter optimization
- hyperparameters
- tuning / Hyperparameter tuning
I
- identity function
- about / Autoencoders
- ImportError error
- about / The installation of packages
- incremental learning
- incremental PCA
- about / Incremental PCA
- independent and identically distributed (i.i.d) / Stochastic gradient descent
- input validator
- IPython
- about / Introducing Jupyter/IPython
- URL / Introducing Jupyter/IPython
- Iris datasets
J
- Java Development Kit (JDK)
- about / H2O
- Jupyter
- about / Introducing Jupyter/IPython
- URL, for example / Introducing Jupyter/IPython
- URL, for installing / Introducing Jupyter/IPython
- URL / Introducing Jupyter/IPython
- Jupyter Notebook Viewer
- Just-in-Time (JIT) compiler
- about / Scale up with Python
K
- K-means
- about / Clustering – K-means
- initialization methods / Initialization methods
- assumptions / K-means assumptions
- selecting / Selection of the best K
- scaling / Scaling K-means – mini-batch
- with H2O / K-means with H2O
- Kaggle
- URL / XGBoost
- KDD99 challenge
- reference / Spark on the KDD99 dataset
- Keras
- about / Keras
- URL / Keras, Keras and TensorFlow installation
- installing / Keras and TensorFlow installation
- convolutional neural networks (CNN), in TensorFlow / Convolutional Neural Networks in TensorFlow through Keras
- KSVM
L
- large scale cloud services
- large scale deep learning
- with H2O / Large scale deep learning with H2O
- large scale machine learning
- LaSVM
- Latent Dirichlect Allocation (LDA) / Nonlinear and faster with Vowpal Wabbit
- Latent Dirichlet Allocation (LDA)
- about / Gensim, LDA
- scaling / Scaling LDA – memory, CPUs, and machines
- Latent Semantic Analysis (LSA)
- about / Gensim
- liblinear-cdblock library
- library for Support Vector Machines (LIBSVM)
- about / Scale up with Python
- Library for Support Vector Machines (LIBSVM)
- linear regression
- with SGD / Linear regression with SGD
- Linux precompiled binaries
- URL / Installing VW
M
- machine learning
- on TensorFlow, with SkFlow / Machine learning on TensorFlow with SkFlow
- incremental learning / Deep learning with large files – incremental learning
- machine learning, with Spark
- about / Machine learning with Spark
- dataset, reading / Reading the dataset
- feature engineering / Feature engineering
- training, giving to learner / Training a learner
- learner's performance, evaluating / Evaluating a learner's performance
- ML pipeline, power / The power of the ML pipeline
- manual tuning / Manual tuning
- cross-validation / Cross-validation, Final cleanup
- MapReduce
- matplotlib package
- about / The matplotlib package
- URL / The matplotlib package
- maximization (M)
- about / Clustering – K-means
- memory profiler
- mini batches
- about / The neural network architecture
- Minimalist GNU for Windows (MinGW) compiler
- MLlib / Machine learning with Spark
- momentum
- training / The neural network architecture
- MrJob
- about / MapReduce
- music recommendation engine
- references / GPU Computing
N
- Nesterov momentum
- about / The neural network architecture
- neural network
- architecture / The neural network architecture
- softmax, for classification / The neural network architecture
- forward propagation / The neural network architecture
- backpropagation / The neural network architecture
- backpropagation, common problems / The neural network architecture
- backpropagation, with mini batch / The neural network architecture
- momentum, training / The neural network architecture
- Nesterov momentum / The neural network architecture
- adaptive gradient (ADAGRAD) / The neural network architecture
- resilient backpropagation (RPROP) / The neural network architecture
- RMSPROP / The neural network architecture
- architecture, selecting / What and how neural networks learn, Choosing the right architecture
- implementing / Neural networks in action
- sknn, parallelizing / Parallelization for sknn
- hyperparameter, optimizing / Neural networks and hyperparameter optimization
- decision boundaries / Neural networks and decision boundaries
- on GPU, with theanets / Deep learning with theanets
- performing, in TensorFlow / A neural network from scratch in TensorFlow
- Neural network
- regularization / Neural networks and regularization
- Neural Network Toolbox (NNT)
- NeuroLab
- nonlinear SVMs
- pursuing, by subsampling / Pursuing nonlinear SVMs by subsampling
- NumPy
- about / Scale up with Python, NumPy
- URL / NumPy
- NVIDIA
- URL / GPU computing
- NVIDIA CUDA Toolkit
- URL / GPU computing
O
- one-against-all (OAA)
- one-hot encoder
- reference link / The hashing trick
- one-vs-all (OVA) / Achieving SVM at scale with SGD
- one-vs-all (OVA) strategy / The Scikit-learn SGD implementation
- OpenCL project
- URL / GPU computing
- Openssh
- URL, for Windows / Using the VM
- Ordinary least squares (OLS) / The Scikit-learn SGD implementation
- out-of-core learning
- about / Out-of-core learning
- subsampling, as viable option / Subsampling as a viable option
- instances, optimizing / Optimizing one instance at a time
- building / Building an out-of-core learning system
- overshooting
- about / The neural network architecture
P
- pandas
- about / Scale up with Python, Pandas
- URL / Pandas
- pandas documentation
- reference link / Using pandas I/O tools
- pip
- about / The installation of packages
- URL / The installation of packages
- pooling layer
- about / The pooling layer
- pretraining
- principal component analysis (PCA)
- about / Machine learning on TensorFlow with SkFlow, Feature decomposition – PCA
- reference link / Machine learning on TensorFlow with SkFlow
- randomized PCA / Randomized PCA
- incremental PCA / Incremental PCA
- sparse PCA / Sparse PCA
- with H2O / PCA with H2O
- Principal Component Analysis (PCA)
- about / Autoencoders
- Putty
- URL / Using the VM
- PyPy
- about / Scale up with Python
- URL / Scale up with Python
- pySpark
- about / pySpark
- pySpark, actions
- pySpark, methods
- Python
- about / Introducing Python
- advantages / Introducing Python
- URL / Introducing Python, Step-by-step installation
- scaling up, with / Scale up with Python
- scaling out, with / Scale out with Python
- large scale machine learning / Python for large scale machine learning
- 2 and Python 3, selecting between / Choosing between Python 2 and Python 3
- installing / Installing Python, Step-by-step installation
- packages, installing / The installation of packages
- packages, upgrading / Package upgrades
- scientific distribution / Scientific distributions
- Jupyter / Introducing Jupyter/IPython
- IPython / Introducing Jupyter/IPython
- reference link / Describing the target
- integrating, with Vowpal Wabbit (VW) / Python integration
- Python-Future
- Python 2
- and Python 3, selecting between / Choosing between Python 2 and Python 3
- Python 2-3 compatible code
- Python 3
- and Python 2, selecting between / Choosing between Python 2 and Python 3
- URL, for compatibility / Choosing between Python 2 and Python 3
- Python Package Index (PyPI)
- about / The installation of packages
- URL / The installation of packages
- Python packages
- about / Python packages
- NumPy / NumPy
- SciPy / SciPy
- pandas / Pandas
- Scikit-learn / Scikit-learn
- pyvw / Python integration
Q
- quadratic programming
- URL / Support Vector Machines
R
- radial basis functions (RBF) / Support Vector Machines
- random forest
- random forest, parameters for bagging
- n_estimators / Random forest and extremely randomized forest
- max_features / Random forest and extremely randomized forest
- min_sample_leaf / Random forest and extremely randomized forest
- max_depth / Random forest and extremely randomized forest
- criterion / Random forest and extremely randomized forest
- min_samples_split / Random forest and extremely randomized forest
- randomized PCA
- about / Randomized PCA
- randomized search
- about / Neural networks and hyperparameter optimization, Fast parameter optimization with randomized search
- fast parameter, optimizing / Fast parameter optimization with randomized search
- extremely randomized trees / Extremely randomized trees and large datasets
- large datasets / Extremely randomized trees and large datasets
- random subspace method
- Read-Eval-Print Loop (REPL)
- about / Step-by-step installation
- receiver operating characteristic (ROC) / Describing the target
- reconstruction error
- about / Autoencoders
- rectified linear unit (ReLU)
- about / The neural network architecture
- regularization
- feature selection / Feature selection by regularization
- about / Neural networks and regularization
- residual network (ResNet)
- about / GPU Computing
- resilient backpropagation (RPROP)
- about / The neural network architecture
- Resilient Distributed Dataset (RDD)
- about / pySpark
- RMSPROP
- about / The neural network architecture
S
- scalability
- about / Explaining scalability in detail
- large scale examples, creating / Making large scale examples
- Python / Introducing Python
- Python, scaling up with / Scale up with Python
- Python, scaling out with / Scale out with Python
- scientific distribution
- about / Scientific distributions
- Scikit-learn
- about / The installation of packages, Scikit-learn, Understanding the Scikit-learn SVM implementation
- URL / Scikit-learn
- matplotlib package / The matplotlib package
- Gensim / Gensim
- H2O / H2O
- XGBoost / XGBoost
- Theano / Theano
- TensorFlow / TensorFlow
- sknn library / The sknn library
- theanets / Theanets
- Keras / Keras
- useful packages, installing / Other useful packages to install on your system
- other alternatives / Other alternatives for SVM fast learning
- Vowpal Wabbit (VW) / Nonlinear and faster with Vowpal Wabbit
- Scikit-learn documentation
- reference link / Describing the target
- scikit-neuralnetwork
- about / The sknn library
- SciPy
- SGD
- used, for achieving SVM / Achieving SVM at scale with SGD
- non-linearity, including / Including non-linearity in SGD
- explicit high-dimensional mappings / Trying explicit high-dimensional mappings
- linear regression / Linear regression with SGD
- SGD function, parameters
- lr / Keras and TensorFlow installation
- decay / Keras and TensorFlow installation
- momentum / Keras and TensorFlow installation
- nesterov / Keras and TensorFlow installation
- optimizer / Keras and TensorFlow installation
- SGD Scikit-learn implementation
- reference link / Defining SGD learning parameters
- sigmoid
- about / The neural network architecture
- Silhouette
- about / Selection of the best K
- Singular Value Decomposition (SVD)
- about / Feature decomposition – PCA
- SkFlow
- machine learning, on TensorFlow / Machine learning on TensorFlow with SkFlow
- sklearn.svm module, parameters
- C / Understanding the Scikit-learn SVM implementation
- kernel / Understanding the Scikit-learn SVM implementation
- degree / Understanding the Scikit-learn SVM implementation
- gamma / Understanding the Scikit-learn SVM implementation
- nu / Understanding the Scikit-learn SVM implementation
- epsilon / Understanding the Scikit-learn SVM implementation
- penalty / Understanding the Scikit-learn SVM implementation
- loss / Understanding the Scikit-learn SVM implementation
- dual / Understanding the Scikit-learn SVM implementation
- sklearn module
- about / The installation of packages
- sknn
- parallelizing / Parallelization for sknn
- sknn library
- about / The sknn library
- URL / The sknn library
- sknn package
- SofiaML
- softmax
- for classification / The neural network architecture
- Spark
- about / Spark, Machine learning with Spark
- pySpark / pySpark
- on KDD99 dataset / Spark on the KDD99 dataset
- Spark, methods
- map(function) / pySpark
- flatMap(function) / pySpark
- filter(function) / pySpark
- sample(withReplacement, fraction, seed) / pySpark
- distinct() / pySpark
- coalesce(numPartitions) / pySpark
- repartition(numPartitions) / pySpark
- groupByKey() / pySpark
- reduceByKey(function) / pySpark
- sortByKey(ascending) / pySpark
- union(otherRDD) / pySpark
- intersection(otherRDD) / pySpark
- join(otherRDD) / pySpark
- Spark DataFrames
- working with / Working with Spark DataFrames
- sparse autoencoder
- URL / Autoencoders
- sparse PCA
- about / Sparse PCA
- sparsity parameters
- about / Autoencoders
- Spyder
- about / Scientific distributions
- SQLite
- reference link / Working with databases
- stacked denoising autoencoders
- deep learning / Autoencoders
- standalone machine
- big data, handling / From a standalone machine to a bunch of nodes
- distributed framework, need for / Why do we need a distributed framework?
- steepest descent
- about / Gradient Boosting Machines
- stochastic gradient boosting
- stochastic gradient descent (SGD) / Stochastic gradient descent
- stochastic learning
- about / Stochastic learning
- batch gradient descent / Batch gradient descent
- stochastic gradient descent (SGD) / Stochastic gradient descent
- Scikit-learn SGD implementation / The Scikit-learn SGD implementation
- SGD learning parameters, defining / Defining SGD learning parameters
- stream handling
- reference link / Datasets to try the real thing yourself
- stride
- about / The convolution layer
- subsample
- subsampling
- nonlinear SVMs, pursuing / Pursuing nonlinear SVMs by subsampling
- subsampling layer
- about / The pooling layer
- support vector machines (SVMs) / The Scikit-learn SGD implementation
- Support Vector Machines (SVMs)
- about / Support Vector Machines
- hinge loss / Hinge loss and its variants
- Scikit-learn / Understanding the Scikit-learn SVM implementation
- nonlinear SVMs, pursuing / Pursuing nonlinear SVMs by subsampling
- achieving, with SGD / Achieving SVM at scale with SGD
T
- tanh
- about / The neural network architecture
- TDM-GCC x64
- URL / Theano
- TensorFlow
- about / TensorFlow
- URL / TensorFlow
- references / TensorFlow
- installing / TensorFlow installation, Keras and TensorFlow installation
- operations / TensorFlow operations
- machine learning, with SkFlow / Machine learning on TensorFlow with SkFlow
- convolutional neural networks (CNN), through Keras / Convolutional Neural Networks in TensorFlow through Keras
- TensorFlow, operations
- GPU, computing / GPU computing
- linear regression, with SGD / Linear regression with SGD
- neural network, performing / A neural network from scratch in TensorFlow
- theanets
- about / Theanets, Deep learning with theanets
- URL / Theanets, Deep learning with theanets
- neural network, on GPU / Deep learning with theanets
- Theano
- about / Theano
- URL / Theano, Theano – parallel computing on the GPU, Installing Theano
- URL, for installing / Theano
- used, for parallel computing on GPU / Theano – parallel computing on the GPU
- installing / Installing Theano
- training_classification_error (.089)
U
- UCI Machine Learning Repository / Datasets to try the real thing yourself
- Uniform Resource Identifier (URI)
- about / HDFS
- universal approximation theorem
- University of California, Irvine (UCI) / Datasets to try the real thing yourself
- unsupervised learning
- autoencoders / Autoencoders and unsupervised learning
- unsupervised methods
- about / Unsupervised methods
- unsupervised pretraining
- US Census dataset
V
- Vagrant
- validation_classification_error (.0954)
- vanishing gradient problem
- about / The neural network architecture
- variables
- sharing, across cluster nodes / Sharing variables across cluster nodes
- VirtualBox
- about / VirtualBox
- URL / VirtualBox
- virtual machine
- setting up / Setting up the VM for this chapter
- virtual machines (VM)
- setting up / Setting up the VM
- VirtualBox / VirtualBox
- about / VirtualBox
- Vagrant / Vagrant
- using / Using the VM
- Vowpal Wabbit (VW)
- about / Scale up with Python, Nonlinear and faster with Vowpal Wabbit
- installing / Installing VW
- URL, for compiling / Installing VW
- data format / Understanding the VW data format
- URL, for dataset / Understanding the VW data format
- Python, integration / Python integration
- examples / A few examples using reductions for SVM and neural nets
- URL, for neural networks / A few examples using reductions for SVM and neural nets
- faster bike-sharing example / Faster bike-sharing
- covertype dataset / The covertype dataset crunched by VW
- vowpal_porpoise / Python integration
W
- Wabbit Wappa / Python integration
- Whitening
- wide networks
- about / The hidden layer
- WinPython
- URL / Scientific distributions
- about / Scientific distributions
- word2vec
- about / Gensim
X
- XGBoost
- about / Scale up with Python, XGBoost
- URL / XGBoost
- URL, for installing / XGBoost
- URL, for code / XGBoost
Y
- Yet Another Resource Negotiator (YARN)
- about / Scale out with Python, YARN
Z
- zero-padding
- about / The convolution layer