Index
A
- AAA principle of Semantic Computing / Semantic Web technologies
- action
- about / The context of Reinforcement Learning
- action-value methods
- about / Action-value methods
- actor-critic methods (on-policy)
- about / Actor-critic methods (on-policy)
- AdaBoost
- about / AdaBoost
- advantages, MapReduce programming framework
- parallel execution / What makes MapReduce cater to the needs of large datasets?
- fault tolerance / What makes MapReduce cater to the needs of large datasets?
- scalability / What makes MapReduce cater to the needs of large datasets?
- data locality / What makes MapReduce cater to the needs of large datasets?
- agent
- about / The context of Reinforcement Learning
- Agglomerative clustering algorithm / Hierarchical clustering
- algorithms
- about / Algorithms and Concurrency
- Amazon EC2
- about / MapReduce programming paradigm
- Ambari
- URL / Hadoop ecosystem components
- about / Hadoop ecosystem components
- Analytics layer
- about / The Analytics layer
- ANNs
- implementing / Implementing ANNs and Deep learning methods
- implementing, Mahout used / Using Mahout
- implementing, R used / Using R
- implementing, Spark used / Using Spark
- implementing, Python (scikit-learn) used / Using Python (Scikit-learn)
- implementing, Julia used / Using Julia
- anomaly detection / Anomaly detection
- ANOVA
- about / ANOVA and F Statistics
- Apache Spark
- about / Apache Spark
- core modules / Apache Spark
- appropriate minsup
- rules, for defining / Rules for defining appropriate minsup
- Apriori, and FP-growth
- implementing, Mahout used / Using Mahout
- implementing, R used / Using R
- implementing, Spark used / Using Spark
- implementing, Python (scikit-learn) used / Using Python (Scikit-learn)
- implementing, Julia used / Using Julia
- Apriori algorithm
- about / Apriori algorithm
- rule generation strategy / Rule generation strategy
- downside / Apriori – the downside
- versus FP-growth algorithm / Apriori versus FP-growth
- ArangoDB / Vendors
- Artificial intelligence (AI)
- versus Machine learning / Artificial intelligence (AI)
- artificial neural network (ANN) / Synapses
- Artificial neural networks (ANN)
- Learning vector quantization (LVQ) / Artificial neural networks (ANN)
- Perceptron / Artificial neural networks (ANN)
- Self-organizing maps (SOM) / Artificial neural networks (ANN)
- Hopfield network / Artificial neural networks (ANN)
- Backpropagation / Dimensionality reduction
- artificial neurons
- about / Artificial neurons or perceptrons
- linear neurons / Linear neurons
- rectified linear neurons / Rectified linear neurons / linear threshold neurons
- linear threshold neurons / Rectified linear neurons / linear threshold neurons
- binary threshold neurons / Binary threshold neurons
- Sigmoid neurons / Sigmoid neurons
- stochastic binary neurons / Stochastic binary neurons
- assignments, Julia / Using variables and assignments
- association rule
- about / Association rule – a definition
- Support criteria / Association rule – a definition
- Confidence criteria / Association rule – a definition
- Association rule based algorithms
- about / Association rule based learning algorithms
- Apriori algorithm / Association rule based learning algorithms
- Eclat algorithm / Association rule based learning algorithms
- autoassociators (AAs) / Autoencoders
- autoencoders / Autoencoders
- Avro
- about / Hadoop ecosystem components
- URL / Hadoop ecosystem components
B
- Backpropagation algorithm
- about / Backpropagation algorithm
- bagging
- about / Bagging
- training step / Bagging
- testing step / Bagging
- under-fitting cases / Bagging
- over-fitting cases / Bagging
- example / Bagging
- basic RL model
- about / Basic RL model – agent-environment interface
- Bayesian learning
- about / Bayesian learning
- Bayesian method based algorithms
- about / Bayesian method based algorithms
- Naive Bayes / Bayesian method based algorithms
- Averaged one-dependence estimators (AODE) / Bayesian method based algorithms
- Bayesian belief network (BBN) / Bayesian method based algorithms
- Bayes theorem
- about / Bayes' theorem
- bell curve
- about / Normal distribution
- Bellman Equation
- about / The policy
- Bernoulli distribution
- about / Bernoulli distribution
- Bernoulli Naïve Bayes classifier
- about / The Bernoulli Naïve Bayes classifier
- bias / Solving the errors: bias and variance
- big data
- about / Big data and the context of large-scale Machine learning
- characteristics / Big data and the context of large-scale Machine learning
- binary threshold neurons
- about / Binary threshold neurons
- Binomial distribution
- about / Binomial distribution
- Poisson probability distribution / Poisson probability distribution
- exponential distribution / Exponential distribution
- normal distribution / Normal distribution
- Blazegraph / Vendors
- boosting
- about / Boosting
- characteristics / Boosting
- Bootstrap Aggregation
- about / Bagging
- browser
- Julia, using via / Using Julia via the browser
- Business Intelligence (BI) / The Analytics layer
- Business Intelligence(BI) / Data science
C
- C4.5 / Information gain and Entropy
- about / C4.5
- CAL5 / C4.5
- CART
- about / CART
- case-based learning / Instance-based learning (IBL)
- case-based reasoning (CBR) / Case-based reasoning (CBR)
- CHAID / C4.5
- Checkpoint / Secondary Namenode and Checkpoint process
- Chukwa
- about / Hadoop ecosystem components
- URL / Hadoop ecosystem components
- Classification and Regression Tree (CART) / Decision tree based algorithms
- clustering
- examples / Clustering-based learning
- types / Types of clustering
- hierarchical clustering / Hierarchical clustering
- partitional clustering / Partitional clustering
- clustering-based learning
- about / Clustering-based learning
- clustering methods
- K-means / Clustering methods
- Expectation maximization (EM) / Clustering methods
- Gaussian mixture models (GMM) / Clustering methods
- clusters
- about / Clustering-based learning
- command line
- Julia code, running from / Running the Julia code from the command line
- command line version
- downloading, of Julia / Downloading and using the command line version of Julia
- using, of Julia / Downloading and using the command line version of Julia
- complexity measure, k-means clustering algorithm / Complexity measures
- complimenting fields, Machine learning
- about / Some complementing fields of Machine learning
- data mining / Data mining
- Artificial intelligence (AI) / Artificial intelligence (AI)
- statistical learning / Statistical learning
- data science / Data science
- components, MapReduce / MapReduce execution flow and components
- concurrency
- about / Algorithms and Concurrency
- concurrent algorithms
- developing / Developing concurrent algorithms
- conditional probability
- about / Types of probability
- Configurable Logical Blocks (CLB)
- about / Field Programmable Gate Array (FPGA)
- confounder
- about / Confounding
- confounding
- about / ANOVA and F Statistics, Confounding
- considerations, for constructing Decision trees
- about / Considerations for constructing Decision trees
- appropriate attributes, selecting / Choosing the appropriate attribute(s), Information gain and Entropy, Gini index, Gain ratio, Termination Criteria / Pruning Decision trees
- information gain / Information gain and Entropy
- Entropy / Information gain and Entropy
- Gini index / Gini index
- Gain ratio / Gain ratio
- Termination Criteria / Termination Criteria / Pruning Decision trees
- Pruning Decision trees / Termination Criteria / Pruning Decision trees
- Consumption layer
- about / The Consumption layer
- continuous quantity
- about / Bernoulli distribution
- convolutional neural networks (CNN/ConvNets)
- about / Convolutional neural networks (CNN/ConvNets)
- convolutional layer (CONV) / Convolutional layer (CONV)
- pooling layer (POOL) / Pooling layer (POOL)
- fully connected layer (FC) / Fully connected layer (FC)
- core components framework, Hadoop
- about / Hadoop core components framework
- core elements, Hadoop / Hadoop and its core elements
- correlation
- about / Revisiting statistics
- correlation coefficient
- about / Revisiting statistics
- covariance
- about / Revisiting statistics
- properties / Properties of covariance
- real-world example / Example
- CRAN
- URL / Installing and setting up R
- Credit Assignment Path (CAP) / Stochastic binary neurons
D
- DAG (Direct Acyclical Graph) / Hadoop ecosystem components
- data
- exploring, with Visualizations / Explaining and exploring data with Visualizations
- data architectures
- evolution / Evolution of data architectures
- data inconsistencies, Machine learning
- under-fitting case / Under-fitting
- over-fitting case / Over-fitting
- data instability / Data instability
- unpredictable data formats / Unpredictable data formats
- data mining
- versus Machine learning / Data mining
- data parallelization
- about / Distributed and parallel computing strategies
- data science
- versus Machine learning / Data science
- datasets
- manipulating, with LINQ / Manipulating datasets with LINQ
- Data Source layer / The Data Source layer
- data structures, Julia / Data structures
- Data Warehouses(DW) / Data science
- deciles
- about / Revisiting statistics
- Decision tree algorithms
- about / Inducing Decision trees – Decision tree algorithms
- CART / CART
- C4.5 / C4.5
- Decision tree based algorithms
- about / Decision tree based algorithms
- Random forest / Decision tree based algorithms
- Classification and Regression Tree (CART) / Decision tree based algorithms
- C4.5 and C5.0 / Decision tree based algorithms
- Chi-square / Decision tree based algorithms
- Gradient boosting machines (GBM) / Decision tree based algorithms
- Automatic Interaction Detection (CHAID / Decision tree based algorithms
- decision stump / Decision tree based algorithms
- Multivariate adaptive regression splines (MARS) / Decision tree based algorithms
- Decision trees
- about / Decision trees
- characteristics / Terminology
- purpose / Purpose and uses
- uses / Purpose and uses
- constructing / Constructing a Decision tree
- missing values, handling / Handling missing values
- constructing, considerations / Considerations for constructing Decision trees
- in graphical representation / Decision trees in a graphical representation
- inducing / Inducing Decision trees – Decision tree algorithms
- benefits / Benefits of Decision trees
- decision trees
- implementing / Implementing Decision trees
- implementing, Mahout used / Using Mahout
- implementing, R used / Using R
- implementing, Spark used / Using Spark
- implementing, Python (scikit-learn) used / Using Python (scikit-learn)
- implementing, Julia used / Using Julia
- Deep Belief Network (DBN) / The human brain
- Deep Boltzmann Machines (DBMs) / Deep Boltzmann Machines (DBMs)
- Deep learning
- about / The human brain
- URL / The human brain
- Deep learning, techniques
- Convolutional Networks / Deep learning
- Restricted Boltzmann Machine (RBM) / Deep learning
- Deep Belief Networks (DBN) / Deep learning
- Stacked Autoencoders / Deep learning
- Deep learning methods
- implementing / Implementing ANNs and Deep learning methods
- implementing, Mahout used / Using Mahout
- implementing, R used / Using R
- implementing, Spark used / Using Spark
- implementing, Python (scikit-learn) used / Using Python (Scikit-learn)
- implementing, Julia used / Using Julia
- Deep Learning taxonomy
- about / Deep learning taxonomy
- convolutional neural networks (CNN/ConvNets) / Convolutional neural networks (CNN/ConvNets)
- Recurrent Neural Networks (RNNs) / Recurrent Neural Networks (RNNs)
- Restricted Boltzmann Machines (RBMs) / Restricted Boltzmann Machines (RBMs)
- Deep Boltzmann Machines (DBMs) / Deep Boltzmann Machines (DBMs)
- autoencoders / Autoencoders
- default block placement policy / Block loading to the cluster and replication
- dendogram
- about / Hierarchical clustering
- denoising autoencoder (DA) / Autoencoders
- dense vectors / Implementing vectors in Mahout
- dependent events
- about / Dependent events
- Descriptive analytics / Emerging perspectives
- Diagnostic analytics / Emerging perspectives
- dimensionality reduction methods
- about / Dimensionality reduction
- Multidimensional scaling (MDS) / Dimensionality reduction
- Principal component analysis (PCA) / Dimensionality reduction
- Projection pursuit (PP) / Dimensionality reduction
- Partial least squares (PLS) regression / Dimensionality reduction
- Sammon mapping / Dimensionality reduction
- discrete quantity
- about / Bernoulli distribution
- disjoint events
- about / Mutually exclusive or disjoint events
- disk
- k-means clustering algorithm, implementing on / K-means clustering on disk
- distance measures, in KNN
- about / Distance measures in KNN
- Euclidean distance / Euclidean distance
- Hamming distance / Hamming distance
- Minkowski distance / Minkowski distance
- distance measures methods, k-means clustering algorithm
- single link / Distance measures
- complete link / Distance measures
- average link / Distance measures
- centroids / Distance measures
- distributed processing
- about / Distributed and parallel computing strategies
- distribution
- about / Distribution
- Bernoulli distribution / Bernoulli distribution
- Binomial distribution / Binomial distribution
- relationship between / Relationship between the distributions
- Divisive clustering algorithm / Hierarchical clustering
- Dynamic Learning Vector Quantization (DLVQ) networks / Dynamic Learning Vector Quantization (DLVQ) networks
- Dynamic Programming (DP)
- about / Dynamic Programming (DP)
E
- Eclipse
- URL / Setting-up Apache Mahout using Eclipse IDE
- Eclipse IDE
- used, for setting up Mahout / Setting-up Apache Mahout using Eclipse IDE
- ecosystem components, Hadoop / Hadoop ecosystem components
- effect modification
- about / ANOVA and F Statistics, Effect modification
- Elman networks / Elman networks
- ensemble learning methods
- about / Ensemble learning methods
- Wisdom of Crowds / The wisdom of the crowd
- key use cases / Key use cases
- Ensemble method algorithms
- about / Ensemble methods
- Random forest / Ensemble methods
- AdaBoost / Ensemble methods
- Bagging / Ensemble methods
- Bootstrapped Aggregation (Boosting) / Ensemble methods
- Stacked generalization (blending) / Ensemble methods
- Gradient boosting machines (GBM) / Ensemble methods
- ensemble methods
- about / Ensemble methods
- supervised ensemble methods / Supervised ensemble methods
- implementing / Implementing ensemble methods
- implementing, Mahout used / Using Mahout
- implementing, R used / Using R
- implementing, Spark used / Using Spark
- implementing, Python (scikit-learn) used / Using Python (Scikit-learn)
- implementing, Julia used / Using Julia
- Enterprise Service Bus (ESB) / Distributed and parallel computing strategies
- environment
- about / The context of Reinforcement Learning
- error measures, performance measures
- accuracy / Is the solution good?
- recall / Is the solution good?
- precision / Is the solution good?
- ETL (Extract-Transform-Load / Hadoop ecosystem components
- Euclidean distance / Euclidean distance
- Euclidean distance measure / Nearest Neighbors
- evaluative feedback, Reinforcement Learning (RL)
- n-Armed Bandit problem / n-Armed Bandit problem
- action-value methods / Action-value methods
- Reinforcement Comparison methods / Reinforcement comparison methods
- events
- types / Types of events
- mutually exclusive / Mutually exclusive or disjoint events
- disjoint / Mutually exclusive or disjoint events
- independent / Independent events
- dependent / Dependent events
- evolutionary trees
- about / Evolutionary trees
- examples, Reinforcement Learning (RL)
- chess game / Examples of Reinforcement Learning
- elevator scheduling / Examples of Reinforcement Learning
- network packet routing / Examples of Reinforcement Learning
- mobile robot behavior / Examples of Reinforcement Learning
- execution flow, MapReduce / MapReduce execution flow and components
- expectation
- properties / Properties of expectation, variance, and covariance
- exponential distribution / Exponential distribution
- Extract, Load, and Transform (ELT)
- overview / Emerging perspectives
- highlights / Emerging perspectives
- benefits / Emerging perspectives
- risks / Emerging perspectives
- Extract, Transform, and Load (ETL) / The Ingestion layer
- overview / Emerging perspectives
- highlights / Emerging perspectives
- benefits / Emerging perspectives
- risks / Emerging perspectives
- Extract, Transform, Load, and Transform (ETLT)
- overview / Emerging perspectives
- highlights / Emerging perspectives
- benefits / Emerging perspectives
- risks / Emerging perspectives
F
- FACT / C4.5
- Flume
- about / Hadoop ecosystem components
- URL / Hadoop ecosystem components
- FoundationDB / Vendors
- FP-growth algorithm
- about / Apriori – the downside, FP-growth algorithm
- versus Apriori algorithm / Apriori versus FP-growth
- FPGA
- about / Field Programmable Gate Array (FPGA)
- frequent pattern tree (FP-tree) / Apriori – the downside
- FS Shell / HDFS command line
- F Statistics
- about / ANOVA and F Statistics
- functional, versus structural
- about / Functional versus Structural – A methodological mismatch
- information, commoditizing / Commoditizing information
- theoretical limitations, of RDBMS / Theoretical limitations of RDBMS
- functions, MapReduce
- Mapper / MapReduce architecture
- Reducer / MapReduce architecture
G
- Generalized Linear Models (GLM)
- about / Generalized Linear Models (GLM)
- GFS (Google File System) / Hadoop Distributed File System (HDFS)
- Google File System (GFS) / Evolution of Hadoop (the platform of choice)
- GPI
- about / Generalized Policy Iteration (GPI)
- on-policy / Monte Carlo methods
- off-policy / Monte Carlo methods
- Gradient boosted regression trees (GBRT) / Gradient boosting machines (GBM)
- gradient boosting machines (GBM)
- about / Gradient boosting machines (GBM)
- Gradient boosting machines (GBM) / Decision tree based algorithms
- Gradient descent method / Gradient descent method
- graphs, Julia
- about / Graphics and plotting
- GraphX / Apache Spark
- Greedy Decision trees
- about / Greedy Decision trees
- ԑ-greedy method
- about / Action-value methods
H
- Hadoop
- about / Theoretical limitations of RDBMS, Introduction to Apache Hadoop
- URL / Introduction to Apache Hadoop
- evolution / Evolution of Hadoop (the platform of choice)
- core elements / Hadoop and its core elements
- core components framework / Hadoop core components framework
- ecosystem components / Hadoop ecosystem components
- starting / Starting Hadoop
- distributions / Hadoop distributions and vendors
- vendors / Hadoop distributions and vendors
- Hadoop (Physical) Infrastructure layer
- about / The Hadoop (Physical) Infrastructure layer – supporting appliance
- Hadoop 2.6.0
- installing, steps / Steps for installing Hadoop 2.6.0
- Hadoop 2.x
- about / Hadoop 2.x
- Hadoop Distributed File System (HDFS) / The Hadoop Storage layer
- about / Hadoop Distributed File System (HDFS)
- Secondary Namenode / Secondary Namenode and Checkpoint process
- Checkpoint process / Secondary Namenode and Checkpoint process
- large data files, splitting / Splitting large data files
- block loading, to cluster and replication / Block loading to the cluster and replication
- Hadoop setup
- about / Hadoop installation and setup
- standalone operation / Hadoop installation and setup
- Pseudo-Distributed Operation / Hadoop installation and setup
- Fully-Distributed Operation / Hadoop installation and setup
- Hadoop Storage layer
- about / The Hadoop Storage layer
- Hamming distance / Hamming distance
- HBase / Hadoop platform / Processing layer
- about / Hadoop ecosystem components
- URL / Hadoop ecosystem components
- HCatalog
- about / Hadoop ecosystem components
- URL / Hadoop ecosystem components
- HDFS
- file, writing to / Writing to and reading from HDFS
- file, reading from / Writing to and reading from HDFS
- HDFS (Hadoop Distributed File System)
- URL / Hadoop ecosystem components
- HDFS command line / HDFS command line
- Hellinger trees
- about / Hellinger trees
- hierarchical clustering
- about / Hierarchical clustering
- HIHO
- about / Hadoop ecosystem components
- URL / Hadoop ecosystem components
- HIHO (Hadoop-in Hadoop-out) / The Hadoop Storage layer
- Hive / Hadoop platform / Processing layer
- about / Hadoop ecosystem components
- URL / Hadoop ecosystem components
- homoscedasticity
- about / Regression methods
- Hopfield networks / Hopfield networks
- HPC
- about / High Performance Computing (HPC) with Message Passing Interface (MPI)
- human brain
- about / The human brain
I
- ID3 (Iterative Dichotomiser 3) / C4.5
- implementation options, for scaling-up Machine learning
- about / Technology and implementation options for scaling-up Machine learning
- MapReduce programming paradigm / MapReduce programming paradigm
- HPC, with MPI / High Performance Computing (HPC) with Message Passing Interface (MPI)
- Language Integrated Queries (LINQ) framework / Language Integrated Queries (LINQ) framework
- datasets, manipulating with LINQ / Manipulating datasets with LINQ
- Graphics Processing Unit (GPU) / Graphics Processing Unit (GPU)
- FPGA / Field Programmable Gate Array (FPGA)
- multiprocessor systems / Multicore or multiprocessor systems
- multicore processors / Multicore or multiprocessor systems
- independent events
- about / Independent events
- Independent Variables (IVs)
- about / Regression methods
- induction / Optimization
- Ingestion layer
- about / The Ingestion layer
- Partitioning pattern / The Ingestion layer
- Pipeline design patterns / The Ingestion layer
- Transformation patterns / The Ingestion layer
- Storage Design / The Ingestion layer
- Data Load pattern / The Ingestion layer
- InputFormat class / InputFormat
- Instance-based learning (IBL)
- about / Instance-based learning (IBL)
- Nearest Neighbors / Nearest Neighbors
- KNN, implementing / Implementing KNN
- instance based learning algorithms
- about / Instance based learning algorithms
- k-Nearest Neighbour (k-NN) / Instance based learning algorithms
- Self-Organizing / Instance based learning algorithms
- Learning vector quantization (LVQ) / Instance based learning algorithms
- Self-organizing maps (SOM) / Instance based learning algorithms
- integrating out
- about / Types of probability
- integration aspects, Julia
- about / Interoperability
- C / Integrating with C
- Python / Integrating with Python
- MATLAB / Integrating with MATLAB
J
- Jena / Vendors
- JobTracker / MapReduce architecture
- joint probability
- about / Types of probability
- Jordan networks / Jordan networks
- Julia
- about / Julia
- characteristics / Julia
- installing / Installing and setting up Julia
- setting up / Installing and setting up Julia
- command line version, downloading of / Downloading and using the command line version of Julia
- command line version, using of / Downloading and using the command line version of Julia
- using, via browser / Using Julia via the browser
- benefits / Benefits of adopting Julia
- used, for implementing decision trees / Using Julia
- used, for implementing KNN / Using Julia
- used, for implementing Support Vector Machines (SVM) / Using Julia
- used, for implementing Apriori and FP-growth / Using Julia
- used, for implementing k-means clustering / Using Julia
- used, for implementing Naïve Bayes algorithm / Using Julia
- used, for implementing logistic regression / Using Julia
- used, for implementing linear regression / Using Julia
- used, for implementing Deep learning methods / Using Julia
- used, for implementing ANNs / Using Julia
- used, for implementing ensemble methods / Using Julia
- Julia, and Hadoop
- integrating / Integrating Julia and Hadoop
- Julia code
- running, from command line / Running the Julia code from the command line
- Julia environment
- reference link / Installing and setting up Julia
- Juno IDE
- using, for running Julia / Using Juno IDE for running Julia
- URL / Using Juno IDE for running Julia
- just-in-time (JIT) compilers / Running the Julia code from the command line
- JVM (Java Virtual Machine) / Scala
K
- k-means algorithm
- advantages / Advantages of the k-means approach
- disadvantages / Disadvantages of the k-means algorithm
- k-means clustering
- implementing / Implementing k-means clustering
- implementing, Mahout used / Using Mahout
- implementing, R used / Using R
- implementing, Spark used / Using Spark
- implementing, Python (scikit-learn) used / Using Python (scikit-learn)
- implementing, Julia used / Using Julia
- k-means clustering algorithm
- about / The k-means clustering algorithm
- convergence criteria / Convergence or stopping criteria for the k-means clustering
- implementing, on disk / K-means clustering on disk
- distance measures / Distance measures
- complexity measure / Complexity measures
- Karush-Kuhn-Tucker (KKT) / Support Vector Machines (SVM)
- kernel functions
- about / Kernel functions
- kernel method based algorithms
- about / Kernel method based algorithms
- support vector machines (SVM) / Kernel method based algorithms
- Linear discriminant analysis (LDA) / Kernel method based algorithms
- kernel methods-based learning
- about / Kernel methods-based learning
- key assumptions, regression methods
- sample cases size / Regression methods
- data accuracy / Regression methods
- outliers / Regression methods
- missing data / Regression methods
- normal distribution / Regression methods
- linear behavior / Regression methods
- homoscedasticity / Regression methods
- key use cases, ensemble learning methods
- about / Key use cases
- recommendation systems / Recommendation systems
- anomaly detection / Anomaly detection
- transfer learning / Transfer learning
- stream mining / Stream mining or classification
- classification / Stream mining or classification
- KNN
- implementing, Mahout used / Using Mahout
- implementing, R used / Using R
- implementing, Spark used / Using Spark
- implementing, Python (scikit-learn) used / Using Python (scikit-learn)
- implementing, Julia used / Using Julia
L
- labeled datasets
- about / Reinforcement Learning (RL)
- Lambda Architecture (LA)
- about / Lambda Architecture (LA)
- Data layer / Lambda Architecture (LA)
- Batch layer / Lambda Architecture (LA)
- Speed layer / Lambda Architecture (LA)
- Serving layer / Lambda Architecture (LA)
- Query function / Lambda Architecture (LA)
- vendors / Vendors
- Lambda Architectures (LA) / Big data and the context of large-scale Machine learning
- large-scale Machine learning
- about / Big data and the context of large-scale Machine learning
- potential issues / Potential issues in large-scale Machine learning
- lazy learners / Instance-based learning (IBL)
- learning
- about / What is learning?
- least squares method / Polynomial (non-linear) regression
- linear neurons / Linear neurons
- linear regression
- implementing / Implementing linear and logistic regression
- implementing, Mahout used / Using Mahout
- implementing, R used / Using R
- implementing, Spark used / Using Spark
- implementing, scikit-learn used / Using scikit-learn
- implementing, Julia used / Using Julia
- linear threshold neurons / Rectified linear neurons / linear threshold neurons
- LINQ
- datasets, manipulating with / Manipulating datasets with LINQ
- LINQ framework
- about / Language Integrated Queries (LINQ) framework
- LMDT / C4.5
- locally weighed regression (LWR) / Locally weighed regression (LWR)
- logistic regression
- odds ratio / Odds ratio in logistic regression
- implementing / Implementing linear and logistic regression
- implementing, Mahout used / Using Mahout
- implementing, R used / Using R
- implementing, Spark used / Using Spark
- implementing, scikit-learn used / Using scikit-learn
- implementing, Julia used / Using Julia
- logistic regression (logit link)
- about / Logistic regression (logit link)
- long-term potentiation (LTP) / Synapses
- Low-Level Virtual Machine (LLVM) / Running the Julia code from the command line
M
- Machine learning
- about / Machine learning
- defining / Definition
- core concepts / Core Concepts and Terminology
- terminology / Core Concepts and Terminology
- phases / What is learning?
- data / Data
- feature / Data
- attribute / Data
- field / Data
- variable / Data
- instance / Data
- feature vector or tuple / Data
- dimension / Data
- dataset / Data
- data types / Data
- coverage / Data
- labeled data / Labeled and unlabeled data
- unlabeled data / Labeled and unlabeled data
- tasks / Tasks
- algorithms / Algorithms, Machine learning algorithms
- model / Models
- data inconsistencies / Data and inconsistencies in Machine learning
- practical examples / Practical Machine learning examples
- problem, types / Types of learning problems
- complimenting fields / Some complementing fields of Machine learning
- versus data mining / Data mining
- versus Artificial intelligence (AI) / Artificial intelligence (AI)
- versus statistical learning / Statistical learning
- versus data science / Data science
- process lifecycle / Machine learning process lifecycle and solution architecture
- solution architecture / Machine learning process lifecycle and solution architecture
- tools / Machine learning tools and frameworks
- frameworks / Machine learning tools and frameworks
- machine learning
- scalability / Machine learning: Scalability and Performance
- performance / Machine learning: Scalability and Performance
- too many data points / Too many data points or instances
- too many instances / Too many data points or instances
- too many attributes / Too many attributes or features
- too many features / Too many attributes or features
- response time windows, shrinking / Shrinking response time windows – need for real-time responses
- highly complex algorithm / Highly complex algorithm
- feed forward, iterative prediction cycles / Feed forward, iterative prediction cycles
- Machine learning algorithms
- about / Machine learning algorithms
- decision tree based algorithms / Decision tree based algorithms
- Bayesian method based algorithms / Bayesian method based algorithms
- Kernel method based algorithms / Kernel method based algorithms
- clustering methods / Clustering methods
- Artificial neural networks (ANN) / Artificial neural networks (ANN)
- Dimensionality Reduction / Dimensionality reduction
- ensemble methods / Ensemble methods
- instance based learning algorithms / Instance based learning algorithms
- regression analysis based algorithms / Regression analysis based algorithms
- association rule based learning algorithms / Association rule based learning algorithms
- machine learning solution architecture, for big data
- about / Machine learning solution architecture for big data (employing Hadoop)
- Data Source layer / The Data Source layer
- Ingestion layer / The Ingestion layer
- Hadoop Storage layer / The Hadoop Storage layer
- Hadoop (Physical) Infrastructure layer / The Hadoop (Physical) Infrastructure layer – supporting appliance
- Hadoop platform / Processing layer / Hadoop platform / Processing layer
- Analytics layer / The Analytics layer
- Consumption layer / The Consumption layer
- Security and Monitoring layer / Security and Monitoring layer
- Machine learning tasks, Mahout
- Collaborative Filtering / Recommendation / How does Mahout work?
- Clustering / How does Mahout work?
- Classification / How does Mahout work?
- frequent itemset mining / How does Mahout work?
- Machine learning tools
- about / Machine learning tools – A landscape
- Mahout
- about / Hadoop ecosystem components, Apache Mahout
- URL / Hadoop ecosystem components
- working / How does Mahout work?
- installing / Installing and setting up Apache Mahout
- setting up / Installing and setting up Apache Mahout
- setting up, Eclipse ID used / Setting-up Apache Mahout using Eclipse IDE
- setting up, without Eclipse / Setting up Apache Mahout without Eclipse
- vectors, implementing in / Implementing vectors in Mahout
- used, for implementing decision trees / Using Mahout
- used, for implementing KNN / Using Mahout
- used, for implementing Support Vector Machines (SVM) / Using Mahout
- used, for implementing Apriori and FP-growth / Using Mahout
- used, for implementing k-means clustering / Using Mahout
- used, for implementing Naïve Bayes algorithm / Using Mahout
- used, for implementing logistic regression / Using Mahout
- used, for implementing linear regression / Using Mahout
- used, for implementing ANNs / Using Mahout
- used, for implementing Deep learning methods / Using Mahout
- used, for implementing ensemble methods / Using Mahout
- Mahout Packages
- about / Mahout Packages
- Mapper job / MapReduce architecture
- MapReduce / Hadoop platform / Processing layer, Multi-model database architecture / polyglot persistence
- about / Theoretical limitations of RDBMS, MapReduce programming paradigm, MapReduce
- architecture / MapReduce architecture
- functions / MapReduce architecture
- execution flow / MapReduce execution flow and components
- components / MapReduce execution flow and components
- URL / Hadoop ecosystem components
- MapReduce components
- developing / Developing MapReduce components
- InputFormat class / InputFormat
- OutputFormat API / OutputFormat
- Mapper implementation / Mapper implementation
- MapReduce programming framework
- advantages / What makes MapReduce cater to the needs of large datasets?
- marginal probability
- about / Types of probability
- MarkLogic 8
- about / Vendors
- Markov Decision Process (MDP)
- about / Markov Decision Process (MDP)
- Markov property
- about / Markov Decision Process (MDP)
- MARS / C4.5
- Master/Workers Model
- about / Distributed and parallel computing strategies
- Maven
- setting up / Setting up Maven
- mdp-toolkit / Implementation of Python (using examples)
- mean
- about / Important terms and definitions
- Mean absolute error (MAE) / Mean absolute error (MAE)
- Mean squared error (MSE) / Mean squared error (MSE)
- median
- about / Important terms and definitions
- methods, for determining probability
- classical method / Probability
- empirical method / Probability
- subjective method / Probability
- Minkowski distance / Minkowski distance
- MLib / Apache Spark
- mlpy / Implementation of Python (using examples)
- mode
- about / Important terms and definitions
- model, Machine learning
- about / Models
- logical models / Logical models
- geometric models / Geometric models
- probabilistic models / Probabilistic models
- model selection process
- about / Model selection process
- modern data architectures, for Machine learning
- about / Modern data architectures for Machine learning
- semantic data architecture / Semantic data architecture
- multi-model database architecture / polyglot persistence / Multi-model database architecture / polyglot persistence
- Monte Carlo methods
- about / Monte Carlo methods
- MPI
- about / High Performance Computing (HPC) with Message Passing Interface (MPI)
- Multi-Layer Perceptrons (MLP) / Synapses
- multi-model database architecture / polyglot persistence
- about / Multi-model database architecture / polyglot persistence
- challenges / Multi-model database architecture / polyglot persistence
- vendors / Vendors
- multicollinearity
- about / Regression methods
- multicore processors
- about / Multicore or multiprocessor systems
- multilayer fully connected feedforward networks / Multilayer fully connected feedforward networks or Multilayer Perceptrons (MLP)
- Multilayer Perceptrons (MLP) / Multilayer fully connected feedforward networks or Multilayer Perceptrons (MLP)
- Multinomial Naïve Bayes classifier
- about / Multinomial Naïve Bayes classifier
- Multiple Instruction Single Data (MISD) / Distributed and parallel computing strategies
- Multiple Instructions Multiple Data (MIMD) / Distributed and parallel computing strategies
- multiple regression
- about / Multiple regression
- multiprocessor systems
- about / Multicore or multiprocessor systems
- Multivariate adaptive regression splines (MARS) / Regression analysis based algorithms
- mutually exclusive events
- about / Mutually exclusive or disjoint events
N
- n-Armed Bandit problem
- about / n-Armed Bandit problem
- Natural language processing (NLP) / Too many attributes or features, Implementation of Python (using examples)
- Naïve Bayes algorithm
- implementing / Implementing Naïve Bayes algorithm
- implementing, Mahout used / Using Mahout
- implementing, R used / Using R
- implementing, Spark used / Using Spark
- implementing, scikit-learn used / Using scikit-learn
- implementing, Julia used / Using Julia
- Naïve Bayes classifier
- about / Naïve Bayes classifier
- Multinomial Naïve Bayes classifier / Multinomial Naïve Bayes classifier
- Bernoulli Naïve Bayes classifier / The Bernoulli Naïve Bayes classifier
- Nearest Neighbors
- about / Nearest Neighbors
- value of k, in KNN / Value of k in KNN
- distance measures, in KNN / Distance measures in KNN
- neighbors / Instance-based learning (IBL)
- neural networks
- about / Neural networks
- neuron / Neuron
- synapses / Synapses
- Neural Network size
- about / Neural Network size
- example / An example
- Neural Network types
- about / Neural network types
- Multilayer fully connected feedforward networks / Multilayer fully connected feedforward networks or Multilayer Perceptrons (MLP)
- Multilayer Perceptrons (MLP) / Multilayer fully connected feedforward networks or Multilayer Perceptrons (MLP)
- Jordan networks / Jordan networks
- Elman networks / Elman networks
- Radial Bias Function (RBF) networks / Radial Bias Function (RBF) networks
- Hopfield networks / Hopfield networks
- Dynamic Learning Vector Quantization (DLVQ) networks / Dynamic Learning Vector Quantization (DLVQ) networks
- Gradient descent method / Gradient descent method
- neuron / Neuron
- new age data architectures
- perspectives, emerging for / Emerging perspectives
- drivers, emerging for / Emerging perspectives
- NLTK / Implementation of Python (using examples)
- normal distribution
- about / Normal distribution
- Normalized MAE (NMAE) / Normalized MSE and MAE (NMSE and NMAE)
- Normalized MSE (NMSE) / Normalized MSE and MAE (NMSE and NMAE)
- null hypothesis
- about / ANOVA and F Statistics
- numeric primitives, Julia / Numeric primitives
- NumPy
- about / Implementation of Python (using examples)
O
- oblique trees
- about / Oblique trees
- ODBC.jl
- reference link / Integrating Julia and Hadoop
- odds ratio, logistic regression
- about / Odds ratio in logistic regression
- model / Model
- OLAP (Online Analytic Processing) / Evolution of data architectures
- OLAP databases
- versus OLTP databases / Evolution of data architectures
- OLTP (Online Transaction Processing) / Evolution of data architectures
- OLTP databases
- versus OLAP databases / Evolution of data architectures
- Oozie
- about / Hadoop ecosystem components
- URL / Hadoop ecosystem components
- optimization, Apriori implementation
- has-based itemset counting / Apriori – the downside
- transaction elimination / counting / Apriori – the downside
- partitioning / Apriori – the downside
- sampling / Apriori – the downside
- dynamic itemset counting / Apriori – the downside
- Oryx / Vendors
- OutputFormat API / OutputFormat
P
- packages, Julia
- about / Packages
- reference link / Packages
- parallel computing strategies
- about / Distributed and parallel computing strategies
- parallel processor architectures
- about / Distributed and parallel computing strategies
- partitional clustering
- about / Partitional clustering
- pattern recognition / Definition
- pattern search / Definition
- percentiles
- about / Revisiting statistics
- performance measures
- using / Performance measures
- solution / Is the solution good?
- Mean squared error (MSE) / Mean squared error (MSE)
- Mean absolute error (MAE) / Mean absolute error (MAE)
- Normalized MSE (NMSE) / Normalized MSE and MAE (NMSE and NMAE)
- Normalized MAE (NMAE) / Normalized MSE and MAE (NMSE and NMAE)
- variance / Solving the errors: bias and variance
- bias / Solving the errors: bias and variance
- phases, Machine learning
- training phase / What is learning?
- validation and test phase / What is learning?
- application phase / What is learning?
- Pig / Hadoop platform / Processing layer
- about / Hadoop ecosystem components
- URL / Hadoop ecosystem components
- plots, Julia
- about / Graphics and plotting
- plyrmr package / Approach 3 – Using RHadoop
- Poisson probability distribution / Poisson probability distribution
- Poisson regression
- about / Poisson regression
- policy
- about / The context of Reinforcement Learning
- polyglot / Multi-model database architecture / polyglot persistence
- polynomial (non-linear) regression
- about / Polynomial (non-linear) regression
- population
- about / Important terms and definitions
- posterior probability
- about / Types of probability
- potential issues, large-scale Machine learning
- parallel execution / Potential issues in large-scale Machine learning
- load balancing / Potential issues in large-scale Machine learning
- skews, managing / Potential issues in large-scale Machine learning
- monitoring / Potential issues in large-scale Machine learning
- fault tolerance / Potential issues in large-scale Machine learning
- auto scaling / Potential issues in large-scale Machine learning
- job scheduling / Potential issues in large-scale Machine learning
- Workflow Management / Potential issues in large-scale Machine learning
- practical implementation aspects
- spam detection / Practical Machine learning examples
- credit card fraud detection / Practical Machine learning examples
- digit recognition / Practical Machine learning examples
- speech recognition / Practical Machine learning examples
- face detection / Practical Machine learning examples
- product recommendation / Practical Machine learning examples
- customer segmentation / Practical Machine learning examples
- stock trading / Practical Machine learning examples
- sentiment analysis / Practical Machine learning examples
- Predictive analytics / Emerging perspectives
- prior probability
- about / Types of probability
- probability
- about / Probability
- methods, for determining / Probability
- types / Types of probability
- posterior probability / Types of probability
- prior probability / Types of probability
- conditional probability / Types of probability
- joint probability / Types of probability
- marginal probability / Types of probability
- Probably Approximately Correct (PAC)
- about / Performance measures
- Approximate / Performance measures
- Probability / Performance measures
- problem types, Machine learning
- about / Types of learning problems
- classification / Classification
- clustering / Clustering
- forecasting / Forecasting, prediction or regression
- prediction / Forecasting, prediction or regression
- regression / Forecasting, prediction or regression
- simulation / Simulation
- optimization / Optimization
- supervised learning / Supervised learning
- unsupervised learning / Unsupervised learning
- semi-supervised learning / Semi-supervised learning
- reinforcement learning / Reinforcement learning
- deep learning / Deep learning
- process lifecycle, Machine learning / Machine learning process lifecycle and solution architecture
- Producer/Consumer Model
- about / Distributed and parallel computing strategies
- Protocol Buffer
- URL / Approach 2 – Using the Rhipe package of R
- PyBrain / Implementation of Python (using examples)
- Pydoop
- about / Implementation of Python (using examples)
- PyML / Implementation of Python (using examples)
- Python
- about / Python
- toolkit options / Toolkit options in Python
- implementing / Implementation of Python (using examples)
- installing / Installing Python and setting up scikit-learn
- Python (scikit-learn)
- used, for implementing decision trees / Using Python (scikit-learn)
- used, for implementing KNN / Using Python (scikit-learn)
- used, for implementing Support Vector Machines (SVM) / Using Python (Scikit-learn)
- used, for implementing Apriori and FP-growth / Using Python (Scikit-learn)
- used, for implementing k-means clustering / Using Python (scikit-learn)
- used, for implementing Deep learning methods / Using Python (Scikit-learn)
- used, for implementing ANNs / Using Python (Scikit-learn)
- used, for implementing ensemble methods / Using Python (Scikit-learn)
Q
- Q-Learning technique
- about / Q-Learning – off-Policy TD
- Quadratic Discriminant Analysis (QDA) / C4.5
- quartiles
- about / Revisiting statistics
- QUEST / C4.5
R
- R
- about / R
- capabilities / R
- installing / Installing and setting up R
- setting up / Installing and setting up R
- used, for implementing decision trees / Using R
- used, for implementing KNN / Using R
- used, for implementing Support Vector Machines (SVM) / Using R
- used, for implementing Apriori and FP-growth / Using R
- used, for implementing k-means clustering / Using R
- used, for implementing Naïve Bayes algorithm / Using R
- used, for implementing logistic regression / Using R
- used, for implementing linear regression / Using R
- used, for implementing Deep learning methods / Using R
- used, for implementing ANNs / Using R
- used, for implementing ensemble methods / Using R
- R, integrating with Apache Hadoop
- about / Integrating R with Apache Hadoop
- R and Streaming APIs, using in Hadoop / Approach 1 – Using R and Streaming APIs in Hadoop
- Rhipe package, using of R / Approach 2 – Using the Rhipe package of R
- RHadoop, using / Approach 3 – Using RHadoop
- R / Hadoop integration approaches
- pros / Summary of R/Hadoop integration approaches
- cons / Summary of R/Hadoop integration approaches
- Radial Bias Function (RBF) networks / Radial Bias Function (RBF) networks
- random access sparse vectors / Implementing vectors in Mahout
- random forests
- about / Random forests, Random forests
- randomness
- about / Important terms and definitions
- range
- about / Revisiting statistics
- R Data Frames
- about / R Data Frames
- RDBMS
- theoretical limitations / Theoretical limitations of RDBMS
- rdfs package / Approach 3 – Using RHadoop
- recommendation systems / Recommendation systems
- rectified linear neurons / Rectified linear neurons / linear threshold neurons
- Recurrent Neural Networks (RNNs) / Recurrent Neural Networks (RNNs), Restricted Boltzmann Machines (RBMs)
- Reducer job / MapReduce architecture
- reference reward
- about / Reinforcement comparison methods
- regression analysis
- about / Regression analysis
- statistics, revisiting / Revisiting statistics
- regression analysis based algorithms
- about / Regression analysis based algorithms
- regression methods
- about / Regression methods
- key assumptions / Regression methods
- simple regression / Simple regression or simple linear regression
- simple linear regression / Simple regression or simple linear regression
- multiple regression / Multiple regression
- polynomial (non-linear) regression / Polynomial (non-linear) regression
- Generalized Linear Models (GLM) / Generalized Linear Models (GLM)
- logistic regression (logit link) / Logistic regression (logit link)
- Poisson regression / Poisson regression
- Reinforcement Comparison methods
- about / Reinforcement comparison methods
- Reinforcement Learning (RL)
- about / Reinforcement Learning (RL)
- context / The context of Reinforcement Learning
- terms / The context of Reinforcement Learning
- examples / Examples of Reinforcement Learning
- evaluative feedback / Evaluative Feedback
- Markov Decision Process (MDP) / Markov Decision Process (MDP)
- Delayed Rewards / Delayed rewards
- optimal policy / The policy
- key features / Reinforcement Learning – key features
- solution methods / Reinforcement learning solution methods
- Reinforcement Learning (RL) problem
- world grid example / The Reinforcement Learning problem – the world grid example
- Remote Procedure Calls (RPC) / Hadoop ecosystem components
- Resilient Distributed Dataset (RDD) / Apache Spark
- Resilient Distributed Datasets (RDD)
- programming with / Programming with Resilient Distributed Datasets (RDD)
- RESTFul HDFS / RESTFul HDFS
- reward
- about / The context of Reinforcement Learning
- R Expressions
- about / R Expressions
- assignments / Assignments
- functions / Functions
- R Factors
- about / R Factors
- rhbase package / Approach 3 – Using RHadoop
- R Learning (Off-policy)
- about / R Learning (Off-policy)
- R Matrices
- about / R Matrices
- rmr package / Approach 3 – Using RHadoop
- root mean square error (RMSE) / Mean squared error (MSE)
- Rote Learner / Instance-based learning (IBL)
- R Statistical frameworks
- about / R Statistical frameworks
- rule extraction / Forecasting, prediction or regression
- R Vectors
- about / R Vectors
- assigning / Assigning, accessing, and manipulating vectors
- accessing / Assigning, accessing, and manipulating vectors
- manipulating / Assigning, accessing, and manipulating vectors
S
- 4Store / Vendors
- sample
- about / Important terms and definitions
- stratified sampling / Important terms and definitions
- sample size
- about / Important terms and definitions
- sample space probability
- about / Probability
- Sampling Bias
- about / Important terms and definitions
- Sarsa
- about / Sarsa - on-Policy TD
- Scala
- about / Scala
- examples / Scala
- scaling-out storage
- versus scaling-up storage / Scaling-up versus Scaling-out storage
- scikit-learn
- about / Implementation of Python (using examples)
- setting up / Installing Python and setting up scikit-learn
- used, for implementing Naïve Bayes algorithm / Using scikit-learn
- used, for implementing logistic regression / Using scikit-learn
- used, for implementing linear regression / Using scikit-learn
- SciPy
- about / Implementation of Python (using examples)
- semantic data architecture
- about / Semantic data architecture
- business data lake / The business data lake
- central data integration / Semantic Web technologies
- peer-to-peer / Semantic Web technologies
- features / Ontology and data integration
- vendors / Vendors
- Semantic Web technologies
- about / Semantic Web technologies
- ontology and data integration / Ontology and data integration
- semi-supervised learning
- about / Reinforcement Learning (RL)
- sequence files
- about / Implementing vectors in Mahout
- sequential access sparse vectors / Implementing vectors in Mahout
- Sesame / Vendors
- shallow learning algorithm / Background
- Shared Nothing Architecture (SNA) / The Hadoop (Physical) Infrastructure layer – supporting appliance
- Sigmoid neurons / Sigmoid neurons
- simple linear regression
- about / Simple regression or simple linear regression
- simple regression
- about / Simple regression or simple linear regression
- Single Instruction Multiple Data (SIMD) / Distributed and parallel computing strategies
- Single Instruction Single Data (SISD) / Distributed and parallel computing strategies
- singularity / Regression methods
- skewed data
- about / Revisiting statistics
- smart data / The Ingestion layer
- Softmax regression technique
- about / Softmax regression technique
- solution architecture, Machine learning / Machine learning process lifecycle and solution architecture
- solution methods, Reinforcement Learning (RL)
- about / Reinforcement learning solution methods
- Dynamic Programming (DP) / Dynamic Programming (DP)
- Monte Carlo methods / Monte Carlo methods
- temporal difference (TD) learning / Temporal difference (TD) learning
- Q-Learning technique / Q-Learning – off-Policy TD
- actor-critic methods (on-policy) / Actor-critic methods (on-policy)
- R Learning (Off-policy) / R Learning (Off-policy)
- Spark
- used, for implementing decision trees / Using Spark
- used, for implementing KNN / Using Spark
- used, for implementing Support Vector Machines (SVM) / Using Spark
- used, for implementing Apriori and FP-growth / Using Spark
- used, for implementing k-means clustering / Using Spark
- used, for implementing Naïve Bayes algorithm / Using Spark
- used, for implementing logistic regression / Using Spark
- used, for implementing linear regression / Using Spark
- used, for implementing Deep learning methods / Using Spark
- used, for implementing ANNs / Using Spark
- used, for implementing ensemble methods / Using Spark
- Spark SQL / Apache Spark
- Spark Streaming / Apache Spark
- sparse vectors
- about / Implementing vectors in Mahout
- random access sparse vectors / Implementing vectors in Mahout
- sequential access sparse vectors / Implementing vectors in Mahout
- specialized trees
- about / Specialized trees
- oblique trees / Oblique trees
- random forests / Random forests
- evolutionary trees / Evolutionary trees
- Hellinger trees / Hellinger trees
- Spring XD / The Analytics layer, Vendors
- about / Spring XD
- features / Spring XD
- Spring XD architecture, layers
- about / Spring XD
- Speed Layer / Spring XD
- Batch Layer / Spring XD
- Serving Layer / Spring XD
- Sqoop / Hadoop platform / Processing layer
- about / Hadoop ecosystem components
- URL / Hadoop ecosystem components
- SSE (Sum Squared Error) / Simple regression or simple linear regression
- SSL (Secure Socket Layer) / Security and Monitoring layer
- standard deviation
- about / Important terms and definitions
- Stardog / Vendors
- state
- about / The context of Reinforcement Learning
- statistical learning
- versus Machine learning / Statistical learning
- statisticians
- objective / Statistician's thinking
- stochastic binary neurons / Stochastic binary neurons
- stratified sampling
- about / Important terms and definitions
- stream mining / Stream mining or classification
- String manipulations, Julia
- working with / Working with Strings and String manipulations
- Strings, Julia
- working with / Working with Strings and String manipulations
- sum of squared error of prediction (SSE) / Convergence or stopping criteria for the k-means clustering
- supervised ensemble methods
- about / Supervised ensemble methods
- boosting / Boosting
- bagging / Bagging
- wagging / Wagging
- supervised learning
- about / Reinforcement Learning (RL)
- Support Vector Machine (SVM) / Implementation of Python (using examples)
- Support Vector Machines (SVM)
- about / Support Vector Machines (SVM)
- Inseparable Data / Inseparable Data
- implementing / Implementing SVM
- implementing, Mahout used / Using Mahout
- implementing, R used / Using R
- implementing, Spark used / Using Spark
- implementing, Python (scikit-learn) used / Using Python (Scikit-learn)
- implementing, Julia used / Using Julia
- support vector machines (SVM) / Kernel method based algorithms
- symmetric distribution
- about / Revisiting statistics
- synapses / Synapses
T
- Tableau / Apache Spark
- Tajo
- about / Hadoop ecosystem components
- URL / Hadoop ecosystem components
- task dependency graph / Developing concurrent algorithms
- task parallelization
- about / Distributed and parallel computing strategies
- TaskTracker / MapReduce architecture
- Temporal Credit Assignment
- about / Delayed rewards
- temporal difference (TD) learning
- about / Temporal difference (TD) learning
- Sarsa / Sarsa - on-Policy TD
- terms, Reinforcement Learning (RL)
- agent / The context of Reinforcement Learning
- environment / The context of Reinforcement Learning
- state / The context of Reinforcement Learning
- action / The context of Reinforcement Learning
- policy / The context of Reinforcement Learning
- reward / The context of Reinforcement Learning
- value / The context of Reinforcement Learning
- top-K recommendation / Instance-based learning (IBL)
- Total Cost of Ownership (TCO) / Commoditizing information
- Total Lifetime Value (TLV) / Classification
- Total overall cost of ownership (TCO) / Emerging perspectives
- traditional ETL architecture
- limitations / Evolution of data architectures
- transfer learning / Transfer learning
- tree Induction method
- ID3 / C4.5
- CHAID / C4.5
- QUEST / C4.5
- CAL5 / C4.5
- FACT / C4.5
- LMDT / C4.5
- MARS / C4.5
U
- Ubuntu-based Hadoop Installation
- prerequisites / Hadoop installation and setup
- Jdk 1.7, installing / Installing Jdk 1.7
- system user, creating for Hadoop / Creating a system user for Hadoop (dedicated)
- IPv6, disabling / Disable IPv6
- uncertainty
- sources / Probability
- Unique Transaction Identifier (UTI) / Association rule – a definition
- unlabelled data set
- about / Reinforcement Learning (RL)
- unsupervised ensemble methods
- about / Unsupervised ensemble methods
V
- value
- about / The context of Reinforcement Learning
- variable
- about / Important terms and definitions
- variables, Julia / Using variables and assignments
- variance / Solving the errors: bias and variance
- about / Revisiting statistics
- properties / Properties of variance
- vectors
- implementing, in Mahout / Implementing vectors in Mahout
- Visualizations
- about / The Consumption layer
- data, exploring with / Explaining and exploring data with Visualizations
- Voronoi cell / Nearest Neighbors
W
- wagging
- about / Wagging
- WebHDFS REST API
- URL / RESTFul HDFS
- Wisdom of Crowds
- about / The wisdom of the crowd
- aggregation / The wisdom of the crowd
- independence / The wisdom of the crowd
- decentralization / The wisdom of the crowd
- diversity of opinion / The wisdom of the crowd
- usage of combiner / The wisdom of the crowd
- dependency between classifiers / The wisdom of the crowd
- diversity, generating / The wisdom of the crowd
- size of ensemble / The wisdom of the crowd
- cross inducers / The wisdom of the crowd
Y
- YARN
- about / Hadoop ecosystem components
Z
- ZooKeeper / Hadoop platform / Processing layer
- about / Hadoop ecosystem components
- URL / Hadoop ecosystem components