Book Image

Practical Machine Learning

By : Sunila Gollapudi
Book Image

Practical Machine Learning

By: Sunila Gollapudi

Overview of this book

This book explores an extensive range of machine learning techniques uncovering hidden tricks and tips for several types of data using practical and real-world examples. While machine learning can be highly theoretical, this book offers a refreshing hands-on approach without losing sight of the underlying principles. Inside, a full exploration of the various algorithms gives you high-quality guidance so you can begin to see just how effective machine learning is at tackling contemporary challenges of big data This is the only book you need to implement a whole suite of open source tools, frameworks, and languages in machine learning. We will cover the leading data science languages, Python and R, and the underrated but powerful Julia, as well as a range of other big data platforms including Spark, Hadoop, and Mahout. Practical Machine Learning is an essential resource for the modern data scientists who want to get to grips with its real-world application. With this book, you will not only learn the fundamentals of machine learning but dive deep into the complexities of real world data before moving on to using Hadoop and its wider ecosystem of tools to process and manage your structured and unstructured data. You will explore different machine learning techniques for both supervised and unsupervised learning; from decision trees to Naïve Bayes classifiers and linear and clustering methods, you will learn strategies for a truly advanced approach to the statistical analysis of data. The book also explores the cutting-edge advancements in machine learning, with worked examples and guidance on deep learning and reinforcement learning, providing you with practical demonstrations and samples that help take the theory–and mystery–out of even the most advanced machine learning methodologies.
Table of Contents (23 chapters)
Practical Machine Learning
Credits
Foreword
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Index

Index

A

  • AAA principle of Semantic Computing / Semantic Web technologies
  • action
    • about / The context of Reinforcement Learning
  • action-value methods
    • about / Action-value methods
  • actor-critic methods (on-policy)
    • about / Actor-critic methods (on-policy)
  • AdaBoost
    • about / AdaBoost
  • advantages, MapReduce programming framework
    • parallel execution / What makes MapReduce cater to the needs of large datasets?
    • fault tolerance / What makes MapReduce cater to the needs of large datasets?
    • scalability / What makes MapReduce cater to the needs of large datasets?
    • data locality / What makes MapReduce cater to the needs of large datasets?
  • agent
    • about / The context of Reinforcement Learning
  • Agglomerative clustering algorithm / Hierarchical clustering
  • algorithms
    • about / Algorithms and Concurrency
  • Amazon EC2
    • about / MapReduce programming paradigm
  • Ambari
    • URL / Hadoop ecosystem components
    • about / Hadoop ecosystem components
  • Analytics layer
    • about / The Analytics layer
  • ANNs
    • implementing / Implementing ANNs and Deep learning methods
    • implementing, Mahout used / Using Mahout
    • implementing, R used / Using R
    • implementing, Spark used / Using Spark
    • implementing, Python (scikit-learn) used / Using Python (Scikit-learn)
    • implementing, Julia used / Using Julia
  • anomaly detection / Anomaly detection
  • ANOVA
    • about / ANOVA and F Statistics
  • Apache Spark
    • about / Apache Spark
    • core modules / Apache Spark
    / Vendors
  • appropriate minsup
    • rules, for defining / Rules for defining appropriate minsup
  • Apriori, and FP-growth
    • implementing, Mahout used / Using Mahout
    • implementing, R used / Using R
    • implementing, Spark used / Using Spark
    • implementing, Python (scikit-learn) used / Using Python (Scikit-learn)
    • implementing, Julia used / Using Julia
  • Apriori algorithm
    • about / Apriori algorithm
    • rule generation strategy / Rule generation strategy
    • downside / Apriori – the downside
    • versus FP-growth algorithm / Apriori versus FP-growth
  • ArangoDB / Vendors
  • Artificial intelligence (AI)
    • versus Machine learning / Artificial intelligence (AI)
  • artificial neural network (ANN) / Synapses
  • Artificial neural networks (ANN)
    • Learning vector quantization (LVQ) / Artificial neural networks (ANN)
    • Perceptron / Artificial neural networks (ANN)
    • Self-organizing maps (SOM) / Artificial neural networks (ANN)
    • Hopfield network / Artificial neural networks (ANN)
    • Backpropagation / Dimensionality reduction
  • artificial neurons
    • about / Artificial neurons or perceptrons
    • linear neurons / Linear neurons
    • rectified linear neurons / Rectified linear neurons / linear threshold neurons
    • linear threshold neurons / Rectified linear neurons / linear threshold neurons
    • binary threshold neurons / Binary threshold neurons
    • Sigmoid neurons / Sigmoid neurons
    • stochastic binary neurons / Stochastic binary neurons
  • assignments, Julia / Using variables and assignments
  • association rule
    • about / Association rule – a definition
    • Support criteria / Association rule – a definition
    • Confidence criteria / Association rule – a definition
  • Association rule based algorithms
    • about / Association rule based learning algorithms
    • Apriori algorithm / Association rule based learning algorithms
    • Eclat algorithm / Association rule based learning algorithms
  • autoassociators (AAs) / Autoencoders
  • autoencoders / Autoencoders
  • Avro
    • about / Hadoop ecosystem components
    • URL / Hadoop ecosystem components

B

  • Backpropagation algorithm
    • about / Backpropagation algorithm
  • bagging
    • about / Bagging
    • training step / Bagging
    • testing step / Bagging
    • under-fitting cases / Bagging
    • over-fitting cases / Bagging
    • example / Bagging
  • basic RL model
    • about / Basic RL model – agent-environment interface
  • Bayesian learning
    • about / Bayesian learning
  • Bayesian method based algorithms
    • about / Bayesian method based algorithms
    • Naive Bayes / Bayesian method based algorithms
    • Averaged one-dependence estimators (AODE) / Bayesian method based algorithms
    • Bayesian belief network (BBN) / Bayesian method based algorithms
  • Bayes theorem
    • about / Bayes' theorem
  • bell curve
    • about / Normal distribution
  • Bellman Equation
    • about / The policy
  • Bernoulli distribution
    • about / Bernoulli distribution
  • Bernoulli Naïve Bayes classifier
    • about / The Bernoulli Naïve Bayes classifier
  • bias / Solving the errors: bias and variance
  • big data
    • about / Big data and the context of large-scale Machine learning
    • characteristics / Big data and the context of large-scale Machine learning
  • binary threshold neurons
    • about / Binary threshold neurons
  • Binomial distribution
    • about / Binomial distribution
    • Poisson probability distribution / Poisson probability distribution
    • exponential distribution / Exponential distribution
    • normal distribution / Normal distribution
  • Blazegraph / Vendors
  • boosting
    • about / Boosting
    • characteristics / Boosting
  • Bootstrap Aggregation
    • about / Bagging
  • browser
    • Julia, using via / Using Julia via the browser
  • Business Intelligence (BI) / The Analytics layer
  • Business Intelligence(BI) / Data science

C

  • C4.5 / Information gain and Entropy
    • about / C4.5
  • CAL5 / C4.5
  • CART
    • about / CART
  • case-based learning / Instance-based learning (IBL)
  • case-based reasoning (CBR) / Case-based reasoning (CBR)
  • CHAID / C4.5
  • Checkpoint / Secondary Namenode and Checkpoint process
  • Chukwa
    • about / Hadoop ecosystem components
    • URL / Hadoop ecosystem components
  • Classification and Regression Tree (CART) / Decision tree based algorithms
  • clustering
    • examples / Clustering-based learning
    • types / Types of clustering
    • hierarchical clustering / Hierarchical clustering
    • partitional clustering / Partitional clustering
  • clustering-based learning
    • about / Clustering-based learning
  • clustering methods
    • K-means / Clustering methods
    • Expectation maximization (EM) / Clustering methods
    • Gaussian mixture models (GMM) / Clustering methods
  • clusters
    • about / Clustering-based learning
  • command line
    • Julia code, running from / Running the Julia code from the command line
  • command line version
    • downloading, of Julia / Downloading and using the command line version of Julia
    • using, of Julia / Downloading and using the command line version of Julia
  • complexity measure, k-means clustering algorithm / Complexity measures
  • complimenting fields, Machine learning
    • about / Some complementing fields of Machine learning
    • data mining / Data mining
    • Artificial intelligence (AI) / Artificial intelligence (AI)
    • statistical learning / Statistical learning
    • data science / Data science
  • components, MapReduce / MapReduce execution flow and components
  • concurrency
    • about / Algorithms and Concurrency
  • concurrent algorithms
    • developing / Developing concurrent algorithms
  • conditional probability
    • about / Types of probability
  • Configurable Logical Blocks (CLB)
    • about / Field Programmable Gate Array (FPGA)
  • confounder
    • about / Confounding
  • confounding
    • about / ANOVA and F Statistics, Confounding
  • considerations, for constructing Decision trees
    • about / Considerations for constructing Decision trees
    • appropriate attributes, selecting / Choosing the appropriate attribute(s), Information gain and Entropy, Gini index, Gain ratio, Termination Criteria / Pruning Decision trees
    • information gain / Information gain and Entropy
    • Entropy / Information gain and Entropy
    • Gini index / Gini index
    • Gain ratio / Gain ratio
    • Termination Criteria / Termination Criteria / Pruning Decision trees
    • Pruning Decision trees / Termination Criteria / Pruning Decision trees
  • Consumption layer
    • about / The Consumption layer
  • continuous quantity
    • about / Bernoulli distribution
  • convolutional neural networks (CNN/ConvNets)
    • about / Convolutional neural networks (CNN/ConvNets)
    • convolutional layer (CONV) / Convolutional layer (CONV)
    • pooling layer (POOL) / Pooling layer (POOL)
    • fully connected layer (FC) / Fully connected layer (FC)
  • core components framework, Hadoop
    • about / Hadoop core components framework
  • core elements, Hadoop / Hadoop and its core elements
  • correlation
    • about / Revisiting statistics
  • correlation coefficient
    • about / Revisiting statistics
  • covariance
    • about / Revisiting statistics
    • properties / Properties of covariance
    • real-world example / Example
  • CRAN
    • URL / Installing and setting up R
  • Credit Assignment Path (CAP) / Stochastic binary neurons

D

  • DAG (Direct Acyclical Graph) / Hadoop ecosystem components
  • data
    • exploring, with Visualizations / Explaining and exploring data with Visualizations
  • data architectures
    • evolution / Evolution of data architectures
  • data inconsistencies, Machine learning
    • under-fitting case / Under-fitting
    • over-fitting case / Over-fitting
    • data instability / Data instability
    • unpredictable data formats / Unpredictable data formats
  • data mining
    • versus Machine learning / Data mining
  • data parallelization
    • about / Distributed and parallel computing strategies
  • data science
    • versus Machine learning / Data science
  • datasets
    • manipulating, with LINQ / Manipulating datasets with LINQ
  • Data Source layer / The Data Source layer
  • data structures, Julia / Data structures
  • Data Warehouses(DW) / Data science
  • deciles
    • about / Revisiting statistics
  • Decision tree algorithms
    • about / Inducing Decision trees – Decision tree algorithms
    • CART / CART
    • C4.5 / C4.5
  • Decision tree based algorithms
    • about / Decision tree based algorithms
    • Random forest / Decision tree based algorithms
    • Classification and Regression Tree (CART) / Decision tree based algorithms
    • C4.5 and C5.0 / Decision tree based algorithms
    • Chi-square / Decision tree based algorithms
    • Gradient boosting machines (GBM) / Decision tree based algorithms
    • Automatic Interaction Detection (CHAID / Decision tree based algorithms
    • decision stump / Decision tree based algorithms
    • Multivariate adaptive regression splines (MARS) / Decision tree based algorithms
  • Decision trees
    • about / Decision trees
    • characteristics / Terminology
    • purpose / Purpose and uses
    • uses / Purpose and uses
    • constructing / Constructing a Decision tree
    • missing values, handling / Handling missing values
    • constructing, considerations / Considerations for constructing Decision trees
    • in graphical representation / Decision trees in a graphical representation
    • inducing / Inducing Decision trees – Decision tree algorithms
    • benefits / Benefits of Decision trees
  • decision trees
    • implementing / Implementing Decision trees
    • implementing, Mahout used / Using Mahout
    • implementing, R used / Using R
    • implementing, Spark used / Using Spark
    • implementing, Python (scikit-learn) used / Using Python (scikit-learn)
    • implementing, Julia used / Using Julia
  • Deep Belief Network (DBN) / The human brain
  • Deep Boltzmann Machines (DBMs) / Deep Boltzmann Machines (DBMs)
  • Deep learning
    • about / The human brain
    • URL / The human brain
  • Deep learning, techniques
    • Convolutional Networks / Deep learning
    • Restricted Boltzmann Machine (RBM) / Deep learning
    • Deep Belief Networks (DBN) / Deep learning
    • Stacked Autoencoders / Deep learning
  • Deep learning methods
    • implementing / Implementing ANNs and Deep learning methods
    • implementing, Mahout used / Using Mahout
    • implementing, R used / Using R
    • implementing, Spark used / Using Spark
    • implementing, Python (scikit-learn) used / Using Python (Scikit-learn)
    • implementing, Julia used / Using Julia
  • Deep Learning taxonomy
    • about / Deep learning taxonomy
    • convolutional neural networks (CNN/ConvNets) / Convolutional neural networks (CNN/ConvNets)
    • Recurrent Neural Networks (RNNs) / Recurrent Neural Networks (RNNs)
    • Restricted Boltzmann Machines (RBMs) / Restricted Boltzmann Machines (RBMs)
    • Deep Boltzmann Machines (DBMs) / Deep Boltzmann Machines (DBMs)
    • autoencoders / Autoencoders
  • default block placement policy / Block loading to the cluster and replication
  • dendogram
    • about / Hierarchical clustering
  • denoising autoencoder (DA) / Autoencoders
  • dense vectors / Implementing vectors in Mahout
  • dependent events
    • about / Dependent events
  • Descriptive analytics / Emerging perspectives
  • Diagnostic analytics / Emerging perspectives
  • dimensionality reduction methods
    • about / Dimensionality reduction
    • Multidimensional scaling (MDS) / Dimensionality reduction
    • Principal component analysis (PCA) / Dimensionality reduction
    • Projection pursuit (PP) / Dimensionality reduction
    • Partial least squares (PLS) regression / Dimensionality reduction
    • Sammon mapping / Dimensionality reduction
  • discrete quantity
    • about / Bernoulli distribution
  • disjoint events
    • about / Mutually exclusive or disjoint events
  • disk
    • k-means clustering algorithm, implementing on / K-means clustering on disk
  • distance measures, in KNN
    • about / Distance measures in KNN
    • Euclidean distance / Euclidean distance
    • Hamming distance / Hamming distance
    • Minkowski distance / Minkowski distance
  • distance measures methods, k-means clustering algorithm
    • single link / Distance measures
    • complete link / Distance measures
    • average link / Distance measures
    • centroids / Distance measures
  • distributed processing
    • about / Distributed and parallel computing strategies
  • distribution
    • about / Distribution
    • Bernoulli distribution / Bernoulli distribution
    • Binomial distribution / Binomial distribution
    • relationship between / Relationship between the distributions
  • Divisive clustering algorithm / Hierarchical clustering
  • Dynamic Learning Vector Quantization (DLVQ) networks / Dynamic Learning Vector Quantization (DLVQ) networks
  • Dynamic Programming (DP)
    • about / Dynamic Programming (DP)

E

  • Eclipse
    • URL / Setting-up Apache Mahout using Eclipse IDE
  • Eclipse IDE
    • used, for setting up Mahout / Setting-up Apache Mahout using Eclipse IDE
  • ecosystem components, Hadoop / Hadoop ecosystem components
  • effect modification
    • about / ANOVA and F Statistics, Effect modification
  • Elman networks / Elman networks
  • ensemble learning methods
    • about / Ensemble learning methods
    • Wisdom of Crowds / The wisdom of the crowd
    • key use cases / Key use cases
  • Ensemble method algorithms
    • about / Ensemble methods
    • Random forest / Ensemble methods
    • AdaBoost / Ensemble methods
    • Bagging / Ensemble methods
    • Bootstrapped Aggregation (Boosting) / Ensemble methods
    • Stacked generalization (blending) / Ensemble methods
    • Gradient boosting machines (GBM) / Ensemble methods
  • ensemble methods
    • about / Ensemble methods
    • supervised ensemble methods / Supervised ensemble methods
    • implementing / Implementing ensemble methods
    • implementing, Mahout used / Using Mahout
    • implementing, R used / Using R
    • implementing, Spark used / Using Spark
    • implementing, Python (scikit-learn) used / Using Python (Scikit-learn)
    • implementing, Julia used / Using Julia
  • Enterprise Service Bus (ESB) / Distributed and parallel computing strategies
  • environment
    • about / The context of Reinforcement Learning
  • error measures, performance measures
    • accuracy / Is the solution good?
    • recall / Is the solution good?
    • precision / Is the solution good?
  • ETL (Extract-Transform-Load / Hadoop ecosystem components
  • Euclidean distance / Euclidean distance
  • Euclidean distance measure / Nearest Neighbors
  • evaluative feedback, Reinforcement Learning (RL)
    • n-Armed Bandit problem / n-Armed Bandit problem
    • action-value methods / Action-value methods
    • Reinforcement Comparison methods / Reinforcement comparison methods
  • events
    • types / Types of events
    • mutually exclusive / Mutually exclusive or disjoint events
    • disjoint / Mutually exclusive or disjoint events
    • independent / Independent events
    • dependent / Dependent events
  • evolutionary trees
    • about / Evolutionary trees
  • examples, Reinforcement Learning (RL)
    • chess game / Examples of Reinforcement Learning
    • elevator scheduling / Examples of Reinforcement Learning
    • network packet routing / Examples of Reinforcement Learning
    • mobile robot behavior / Examples of Reinforcement Learning
  • execution flow, MapReduce / MapReduce execution flow and components
  • expectation
    • properties / Properties of expectation, variance, and covariance
  • exponential distribution / Exponential distribution
  • Extract, Load, and Transform (ELT)
    • overview / Emerging perspectives
    • highlights / Emerging perspectives
    • benefits / Emerging perspectives
    • risks / Emerging perspectives
  • Extract, Transform, and Load (ETL) / The Ingestion layer
    • overview / Emerging perspectives
    • highlights / Emerging perspectives
    • benefits / Emerging perspectives
    • risks / Emerging perspectives
  • Extract, Transform, Load, and Transform (ETLT)
    • overview / Emerging perspectives
    • highlights / Emerging perspectives
    • benefits / Emerging perspectives
    • risks / Emerging perspectives

F

  • FACT / C4.5
  • Flume
    • about / Hadoop ecosystem components
    • URL / Hadoop ecosystem components
  • FoundationDB / Vendors
  • FP-growth algorithm
    • about / Apriori – the downside, FP-growth algorithm
    • versus Apriori algorithm / Apriori versus FP-growth
  • FPGA
    • about / Field Programmable Gate Array (FPGA)
  • frequent pattern tree (FP-tree) / Apriori – the downside
  • FS Shell / HDFS command line
  • F Statistics
    • about / ANOVA and F Statistics
  • functional, versus structural
    • about / Functional versus Structural – A methodological mismatch
    • information, commoditizing / Commoditizing information
    • theoretical limitations, of RDBMS / Theoretical limitations of RDBMS
  • functions, MapReduce
    • Mapper / MapReduce architecture
    • Reducer / MapReduce architecture

G

  • Generalized Linear Models (GLM)
    • about / Generalized Linear Models (GLM)
  • GFS (Google File System) / Hadoop Distributed File System (HDFS)
  • Google File System (GFS) / Evolution of Hadoop (the platform of choice)
  • GPI
    • about / Generalized Policy Iteration (GPI)
    • on-policy / Monte Carlo methods
    • off-policy / Monte Carlo methods
  • Gradient boosted regression trees (GBRT) / Gradient boosting machines (GBM)
  • gradient boosting machines (GBM)
    • about / Gradient boosting machines (GBM)
  • Gradient boosting machines (GBM) / Decision tree based algorithms
  • Gradient descent method / Gradient descent method
  • graphs, Julia
    • about / Graphics and plotting
  • GraphX / Apache Spark
  • Greedy Decision trees
    • about / Greedy Decision trees
  • ԑ-greedy method
    • about / Action-value methods

H

  • Hadoop
    • about / Theoretical limitations of RDBMS, Introduction to Apache Hadoop
    • URL / Introduction to Apache Hadoop
    • evolution / Evolution of Hadoop (the platform of choice)
    • core elements / Hadoop and its core elements
    • core components framework / Hadoop core components framework
    • ecosystem components / Hadoop ecosystem components
    • starting / Starting Hadoop
    • distributions / Hadoop distributions and vendors
    • vendors / Hadoop distributions and vendors
    / Multi-model database architecture / polyglot persistence
  • Hadoop (Physical) Infrastructure layer
    • about / The Hadoop (Physical) Infrastructure layer – supporting appliance
  • Hadoop 2.6.0
    • installing, steps / Steps for installing Hadoop 2.6.0
  • Hadoop 2.x
    • about / Hadoop 2.x
  • Hadoop Distributed File System (HDFS) / The Hadoop Storage layer
    • about / Hadoop Distributed File System (HDFS)
    • Secondary Namenode / Secondary Namenode and Checkpoint process
    • Checkpoint process / Secondary Namenode and Checkpoint process
    • large data files, splitting / Splitting large data files
    • block loading, to cluster and replication / Block loading to the cluster and replication
  • Hadoop setup
    • about / Hadoop installation and setup
    • standalone operation / Hadoop installation and setup
    • Pseudo-Distributed Operation / Hadoop installation and setup
    • Fully-Distributed Operation / Hadoop installation and setup
  • Hadoop Storage layer
    • about / The Hadoop Storage layer
  • Hamming distance / Hamming distance
  • HBase / Hadoop platform / Processing layer
    • about / Hadoop ecosystem components
    • URL / Hadoop ecosystem components
  • HCatalog
    • about / Hadoop ecosystem components
    • URL / Hadoop ecosystem components
  • HDFS
    • file, writing to / Writing to and reading from HDFS
    • file, reading from / Writing to and reading from HDFS
  • HDFS (Hadoop Distributed File System)
    • URL / Hadoop ecosystem components
  • HDFS command line / HDFS command line
  • Hellinger trees
    • about / Hellinger trees
  • hierarchical clustering
    • about / Hierarchical clustering
  • HIHO
    • about / Hadoop ecosystem components
    • URL / Hadoop ecosystem components
  • HIHO (Hadoop-in Hadoop-out) / The Hadoop Storage layer
  • Hive / Hadoop platform / Processing layer
    • about / Hadoop ecosystem components
    • URL / Hadoop ecosystem components
  • homoscedasticity
    • about / Regression methods
  • Hopfield networks / Hopfield networks
  • HPC
    • about / High Performance Computing (HPC) with Message Passing Interface (MPI)
  • human brain
    • about / The human brain

I

  • ID3 (Iterative Dichotomiser 3) / C4.5
  • implementation options, for scaling-up Machine learning
    • about / Technology and implementation options for scaling-up Machine learning
    • MapReduce programming paradigm / MapReduce programming paradigm
    • HPC, with MPI / High Performance Computing (HPC) with Message Passing Interface (MPI)
    • Language Integrated Queries (LINQ) framework / Language Integrated Queries (LINQ) framework
    • datasets, manipulating with LINQ / Manipulating datasets with LINQ
    • Graphics Processing Unit (GPU) / Graphics Processing Unit (GPU)
    • FPGA / Field Programmable Gate Array (FPGA)
    • multiprocessor systems / Multicore or multiprocessor systems
    • multicore processors / Multicore or multiprocessor systems
  • independent events
    • about / Independent events
  • Independent Variables (IVs)
    • about / Regression methods
  • induction / Optimization
  • Ingestion layer
    • about / The Ingestion layer
    • Partitioning pattern / The Ingestion layer
    • Pipeline design patterns / The Ingestion layer
    • Transformation patterns / The Ingestion layer
    • Storage Design / The Ingestion layer
    • Data Load pattern / The Ingestion layer
  • InputFormat class / InputFormat
  • Instance-based learning (IBL)
    • about / Instance-based learning (IBL)
    • Nearest Neighbors / Nearest Neighbors
    • KNN, implementing / Implementing KNN
  • instance based learning algorithms
    • about / Instance based learning algorithms
    • k-Nearest Neighbour (k-NN) / Instance based learning algorithms
    • Self-Organizing / Instance based learning algorithms
    • Learning vector quantization (LVQ) / Instance based learning algorithms
    • Self-organizing maps (SOM) / Instance based learning algorithms
  • integrating out
    • about / Types of probability
  • integration aspects, Julia
    • about / Interoperability
    • C / Integrating with C
    • Python / Integrating with Python
    • MATLAB / Integrating with MATLAB

J

  • Jena / Vendors
  • JobTracker / MapReduce architecture
  • joint probability
    • about / Types of probability
  • Jordan networks / Jordan networks
  • Julia
    • about / Julia
    • characteristics / Julia
    • installing / Installing and setting up Julia
    • setting up / Installing and setting up Julia
    • command line version, downloading of / Downloading and using the command line version of Julia
    • command line version, using of / Downloading and using the command line version of Julia
    • using, via browser / Using Julia via the browser
    • benefits / Benefits of adopting Julia
    • used, for implementing decision trees / Using Julia
    • used, for implementing KNN / Using Julia
    • used, for implementing Support Vector Machines (SVM) / Using Julia
    • used, for implementing Apriori and FP-growth / Using Julia
    • used, for implementing k-means clustering / Using Julia
    • used, for implementing Naïve Bayes algorithm / Using Julia
    • used, for implementing logistic regression / Using Julia
    • used, for implementing linear regression / Using Julia
    • used, for implementing Deep learning methods / Using Julia
    • used, for implementing ANNs / Using Julia
    • used, for implementing ensemble methods / Using Julia
  • Julia, and Hadoop
    • integrating / Integrating Julia and Hadoop
  • Julia code
    • running, from command line / Running the Julia code from the command line
  • Julia environment
    • reference link / Installing and setting up Julia
  • Juno IDE
    • using, for running Julia / Using Juno IDE for running Julia
    • URL / Using Juno IDE for running Julia
  • just-in-time (JIT) compilers / Running the Julia code from the command line
  • JVM (Java Virtual Machine) / Scala

K

  • k-means algorithm
    • advantages / Advantages of the k-means approach
    • disadvantages / Disadvantages of the k-means algorithm
  • k-means clustering
    • implementing / Implementing k-means clustering
    • implementing, Mahout used / Using Mahout
    • implementing, R used / Using R
    • implementing, Spark used / Using Spark
    • implementing, Python (scikit-learn) used / Using Python (scikit-learn)
    • implementing, Julia used / Using Julia
  • k-means clustering algorithm
    • about / The k-means clustering algorithm
    • convergence criteria / Convergence or stopping criteria for the k-means clustering
    • implementing, on disk / K-means clustering on disk
    • distance measures / Distance measures
    • complexity measure / Complexity measures
  • Karush-Kuhn-Tucker (KKT) / Support Vector Machines (SVM)
  • kernel functions
    • about / Kernel functions
  • kernel method based algorithms
    • about / Kernel method based algorithms
    • support vector machines (SVM) / Kernel method based algorithms
    • Linear discriminant analysis (LDA) / Kernel method based algorithms
  • kernel methods-based learning
    • about / Kernel methods-based learning
  • key assumptions, regression methods
    • sample cases size / Regression methods
    • data accuracy / Regression methods
    • outliers / Regression methods
    • missing data / Regression methods
    • normal distribution / Regression methods
    • linear behavior / Regression methods
    • homoscedasticity / Regression methods
  • key use cases, ensemble learning methods
    • about / Key use cases
    • recommendation systems / Recommendation systems
    • anomaly detection / Anomaly detection
    • transfer learning / Transfer learning
    • stream mining / Stream mining or classification
    • classification / Stream mining or classification
  • KNN
    • implementing, Mahout used / Using Mahout
    • implementing, R used / Using R
    • implementing, Spark used / Using Spark
    • implementing, Python (scikit-learn) used / Using Python (scikit-learn)
    • implementing, Julia used / Using Julia

L

  • labeled datasets
    • about / Reinforcement Learning (RL)
  • Lambda Architecture (LA)
    • about / Lambda Architecture (LA)
    • Data layer / Lambda Architecture (LA)
    • Batch layer / Lambda Architecture (LA)
    • Speed layer / Lambda Architecture (LA)
    • Serving layer / Lambda Architecture (LA)
    • Query function / Lambda Architecture (LA)
    • vendors / Vendors
  • Lambda Architectures (LA) / Big data and the context of large-scale Machine learning
  • large-scale Machine learning
    • about / Big data and the context of large-scale Machine learning
    • potential issues / Potential issues in large-scale Machine learning
  • lazy learners / Instance-based learning (IBL)
  • learning
    • about / What is learning?
  • least squares method / Polynomial (non-linear) regression
  • linear neurons / Linear neurons
  • linear regression
    • implementing / Implementing linear and logistic regression
    • implementing, Mahout used / Using Mahout
    • implementing, R used / Using R
    • implementing, Spark used / Using Spark
    • implementing, scikit-learn used / Using scikit-learn
    • implementing, Julia used / Using Julia
  • linear threshold neurons / Rectified linear neurons / linear threshold neurons
  • LINQ
    • datasets, manipulating with / Manipulating datasets with LINQ
  • LINQ framework
    • about / Language Integrated Queries (LINQ) framework
  • LMDT / C4.5
  • locally weighed regression (LWR) / Locally weighed regression (LWR)
  • logistic regression
    • odds ratio / Odds ratio in logistic regression
    • implementing / Implementing linear and logistic regression
    • implementing, Mahout used / Using Mahout
    • implementing, R used / Using R
    • implementing, Spark used / Using Spark
    • implementing, scikit-learn used / Using scikit-learn
    • implementing, Julia used / Using Julia
  • logistic regression (logit link)
    • about / Logistic regression (logit link)
  • long-term potentiation (LTP) / Synapses
  • Low-Level Virtual Machine (LLVM) / Running the Julia code from the command line

M

  • Machine learning
    • about / Machine learning
    • defining / Definition
    • core concepts / Core Concepts and Terminology
    • terminology / Core Concepts and Terminology
    • phases / What is learning?
    • data / Data
    • feature / Data
    • attribute / Data
    • field / Data
    • variable / Data
    • instance / Data
    • feature vector or tuple / Data
    • dimension / Data
    • dataset / Data
    • data types / Data
    • coverage / Data
    • labeled data / Labeled and unlabeled data
    • unlabeled data / Labeled and unlabeled data
    • tasks / Tasks
    • algorithms / Algorithms, Machine learning algorithms
    • model / Models
    • data inconsistencies / Data and inconsistencies in Machine learning
    • practical examples / Practical Machine learning examples
    • problem, types / Types of learning problems
    • complimenting fields / Some complementing fields of Machine learning
    • versus data mining / Data mining
    • versus Artificial intelligence (AI) / Artificial intelligence (AI)
    • versus statistical learning / Statistical learning
    • versus data science / Data science
    • process lifecycle / Machine learning process lifecycle and solution architecture
    • solution architecture / Machine learning process lifecycle and solution architecture
    • tools / Machine learning tools and frameworks
    • frameworks / Machine learning tools and frameworks
  • machine learning
    • scalability / Machine learning: Scalability and Performance
    • performance / Machine learning: Scalability and Performance
    • too many data points / Too many data points or instances
    • too many instances / Too many data points or instances
    • too many attributes / Too many attributes or features
    • too many features / Too many attributes or features
    • response time windows, shrinking / Shrinking response time windows – need for real-time responses
    • highly complex algorithm / Highly complex algorithm
    • feed forward, iterative prediction cycles / Feed forward, iterative prediction cycles
  • Machine learning algorithms
    • about / Machine learning algorithms
    • decision tree based algorithms / Decision tree based algorithms
    • Bayesian method based algorithms / Bayesian method based algorithms
    • Kernel method based algorithms / Kernel method based algorithms
    • clustering methods / Clustering methods
    • Artificial neural networks (ANN) / Artificial neural networks (ANN)
    • Dimensionality Reduction / Dimensionality reduction
    • ensemble methods / Ensemble methods
    • instance based learning algorithms / Instance based learning algorithms
    • regression analysis based algorithms / Regression analysis based algorithms
    • association rule based learning algorithms / Association rule based learning algorithms
  • machine learning solution architecture, for big data
    • about / Machine learning solution architecture for big data (employing Hadoop)
    • Data Source layer / The Data Source layer
    • Ingestion layer / The Ingestion layer
    • Hadoop Storage layer / The Hadoop Storage layer
    • Hadoop (Physical) Infrastructure layer / The Hadoop (Physical) Infrastructure layer – supporting appliance
    • Hadoop platform / Processing layer / Hadoop platform / Processing layer
    • Analytics layer / The Analytics layer
    • Consumption layer / The Consumption layer
    • Security and Monitoring layer / Security and Monitoring layer
  • Machine learning tasks, Mahout
    • Collaborative Filtering / Recommendation / How does Mahout work?
    • Clustering / How does Mahout work?
    • Classification / How does Mahout work?
    • frequent itemset mining / How does Mahout work?
  • Machine learning tools
    • about / Machine learning tools – A landscape
  • Mahout
    • about / Hadoop ecosystem components, Apache Mahout
    • URL / Hadoop ecosystem components
    • working / How does Mahout work?
    • installing / Installing and setting up Apache Mahout
    • setting up / Installing and setting up Apache Mahout
    • setting up, Eclipse ID used / Setting-up Apache Mahout using Eclipse IDE
    • setting up, without Eclipse / Setting up Apache Mahout without Eclipse
    • vectors, implementing in / Implementing vectors in Mahout
    • used, for implementing decision trees / Using Mahout
    • used, for implementing KNN / Using Mahout
    • used, for implementing Support Vector Machines (SVM) / Using Mahout
    • used, for implementing Apriori and FP-growth / Using Mahout
    • used, for implementing k-means clustering / Using Mahout
    • used, for implementing Naïve Bayes algorithm / Using Mahout
    • used, for implementing logistic regression / Using Mahout
    • used, for implementing linear regression / Using Mahout
    • used, for implementing ANNs / Using Mahout
    • used, for implementing Deep learning methods / Using Mahout
    • used, for implementing ensemble methods / Using Mahout
  • Mahout Packages
    • about / Mahout Packages
  • Mapper job / MapReduce architecture
  • MapReduce / Hadoop platform / Processing layer, Multi-model database architecture / polyglot persistence
    • about / Theoretical limitations of RDBMS, MapReduce programming paradigm, MapReduce
    • architecture / MapReduce architecture
    • functions / MapReduce architecture
    • execution flow / MapReduce execution flow and components
    • components / MapReduce execution flow and components
    • URL / Hadoop ecosystem components
  • MapReduce components
    • developing / Developing MapReduce components
    • InputFormat class / InputFormat
    • OutputFormat API / OutputFormat
    • Mapper implementation / Mapper implementation
  • MapReduce programming framework
    • advantages / What makes MapReduce cater to the needs of large datasets?
  • marginal probability
    • about / Types of probability
  • MarkLogic 8
    • about / Vendors
  • Markov Decision Process (MDP)
    • about / Markov Decision Process (MDP)
  • Markov property
    • about / Markov Decision Process (MDP)
  • MARS / C4.5
  • Master/Workers Model
    • about / Distributed and parallel computing strategies
  • Maven
    • setting up / Setting up Maven
  • mdp-toolkit / Implementation of Python (using examples)
  • mean
    • about / Important terms and definitions
  • Mean absolute error (MAE) / Mean absolute error (MAE)
  • Mean squared error (MSE) / Mean squared error (MSE)
  • median
    • about / Important terms and definitions
  • methods, for determining probability
    • classical method / Probability
    • empirical method / Probability
    • subjective method / Probability
  • Minkowski distance / Minkowski distance
  • MLib / Apache Spark
  • mlpy / Implementation of Python (using examples)
  • mode
    • about / Important terms and definitions
  • model, Machine learning
    • about / Models
    • logical models / Logical models
    • geometric models / Geometric models
    • probabilistic models / Probabilistic models
  • model selection process
    • about / Model selection process
  • modern data architectures, for Machine learning
    • about / Modern data architectures for Machine learning
    • semantic data architecture / Semantic data architecture
    • multi-model database architecture / polyglot persistence / Multi-model database architecture / polyglot persistence
  • Monte Carlo methods
    • about / Monte Carlo methods
  • MPI
    • about / High Performance Computing (HPC) with Message Passing Interface (MPI)
  • Multi-Layer Perceptrons (MLP) / Synapses
  • multi-model database architecture / polyglot persistence
    • about / Multi-model database architecture / polyglot persistence
    • challenges / Multi-model database architecture / polyglot persistence
    • vendors / Vendors
  • multicollinearity
    • about / Regression methods
  • multicore processors
    • about / Multicore or multiprocessor systems
  • multilayer fully connected feedforward networks / Multilayer fully connected feedforward networks or Multilayer Perceptrons (MLP)
  • Multilayer Perceptrons (MLP) / Multilayer fully connected feedforward networks or Multilayer Perceptrons (MLP)
  • Multinomial Naïve Bayes classifier
    • about / Multinomial Naïve Bayes classifier
  • Multiple Instruction Single Data (MISD) / Distributed and parallel computing strategies
  • Multiple Instructions Multiple Data (MIMD) / Distributed and parallel computing strategies
  • multiple regression
    • about / Multiple regression
  • multiprocessor systems
    • about / Multicore or multiprocessor systems
  • Multivariate adaptive regression splines (MARS) / Regression analysis based algorithms
  • mutually exclusive events
    • about / Mutually exclusive or disjoint events

N

  • n-Armed Bandit problem
    • about / n-Armed Bandit problem
  • Natural language processing (NLP) / Too many attributes or features, Implementation of Python (using examples)
  • Naïve Bayes algorithm
    • implementing / Implementing Naïve Bayes algorithm
    • implementing, Mahout used / Using Mahout
    • implementing, R used / Using R
    • implementing, Spark used / Using Spark
    • implementing, scikit-learn used / Using scikit-learn
    • implementing, Julia used / Using Julia
  • Naïve Bayes classifier
    • about / Naïve Bayes classifier
    • Multinomial Naïve Bayes classifier / Multinomial Naïve Bayes classifier
    • Bernoulli Naïve Bayes classifier / The Bernoulli Naïve Bayes classifier
  • Nearest Neighbors
    • about / Nearest Neighbors
    • value of k, in KNN / Value of k in KNN
    • distance measures, in KNN / Distance measures in KNN
  • neighbors / Instance-based learning (IBL)
  • neural networks
    • about / Neural networks
    • neuron / Neuron
    • synapses / Synapses
  • Neural Network size
    • about / Neural Network size
    • example / An example
  • Neural Network types
    • about / Neural network types
    • Multilayer fully connected feedforward networks / Multilayer fully connected feedforward networks or Multilayer Perceptrons (MLP)
    • Multilayer Perceptrons (MLP) / Multilayer fully connected feedforward networks or Multilayer Perceptrons (MLP)
    • Jordan networks / Jordan networks
    • Elman networks / Elman networks
    • Radial Bias Function (RBF) networks / Radial Bias Function (RBF) networks
    • Hopfield networks / Hopfield networks
    • Dynamic Learning Vector Quantization (DLVQ) networks / Dynamic Learning Vector Quantization (DLVQ) networks
    • Gradient descent method / Gradient descent method
  • neuron / Neuron
  • new age data architectures
    • perspectives, emerging for / Emerging perspectives
    • drivers, emerging for / Emerging perspectives
  • NLTK / Implementation of Python (using examples)
  • normal distribution
    • about / Normal distribution
  • Normalized MAE (NMAE) / Normalized MSE and MAE (NMSE and NMAE)
  • Normalized MSE (NMSE) / Normalized MSE and MAE (NMSE and NMAE)
  • null hypothesis
    • about / ANOVA and F Statistics
  • numeric primitives, Julia / Numeric primitives
  • NumPy
    • about / Implementation of Python (using examples)

O

  • oblique trees
    • about / Oblique trees
  • ODBC.jl
    • reference link / Integrating Julia and Hadoop
  • odds ratio, logistic regression
    • about / Odds ratio in logistic regression
    • model / Model
  • OLAP (Online Analytic Processing) / Evolution of data architectures
  • OLAP databases
    • versus OLTP databases / Evolution of data architectures
  • OLTP (Online Transaction Processing) / Evolution of data architectures
  • OLTP databases
    • versus OLAP databases / Evolution of data architectures
  • Oozie
    • about / Hadoop ecosystem components
    • URL / Hadoop ecosystem components
  • optimization, Apriori implementation
    • has-based itemset counting / Apriori – the downside
    • transaction elimination / counting / Apriori – the downside
    • partitioning / Apriori – the downside
    • sampling / Apriori – the downside
    • dynamic itemset counting / Apriori – the downside
  • Oryx / Vendors
  • OutputFormat API / OutputFormat

P

  • packages, Julia
    • about / Packages
    • reference link / Packages
  • parallel computing strategies
    • about / Distributed and parallel computing strategies
  • parallel processor architectures
    • about / Distributed and parallel computing strategies
  • partitional clustering
    • about / Partitional clustering
  • pattern recognition / Definition
  • pattern search / Definition
  • percentiles
    • about / Revisiting statistics
  • performance measures
    • using / Performance measures
    • solution / Is the solution good?
    • Mean squared error (MSE) / Mean squared error (MSE)
    • Mean absolute error (MAE) / Mean absolute error (MAE)
    • Normalized MSE (NMSE) / Normalized MSE and MAE (NMSE and NMAE)
    • Normalized MAE (NMAE) / Normalized MSE and MAE (NMSE and NMAE)
    • variance / Solving the errors: bias and variance
    • bias / Solving the errors: bias and variance
  • phases, Machine learning
    • training phase / What is learning?
    • validation and test phase / What is learning?
    • application phase / What is learning?
  • Pig / Hadoop platform / Processing layer
    • about / Hadoop ecosystem components
    • URL / Hadoop ecosystem components
  • plots, Julia
    • about / Graphics and plotting
  • plyrmr package / Approach 3 – Using RHadoop
  • Poisson probability distribution / Poisson probability distribution
  • Poisson regression
    • about / Poisson regression
  • policy
    • about / The context of Reinforcement Learning
  • polyglot / Multi-model database architecture / polyglot persistence
  • polynomial (non-linear) regression
    • about / Polynomial (non-linear) regression
  • population
    • about / Important terms and definitions
  • posterior probability
    • about / Types of probability
  • potential issues, large-scale Machine learning
    • parallel execution / Potential issues in large-scale Machine learning
    • load balancing / Potential issues in large-scale Machine learning
    • skews, managing / Potential issues in large-scale Machine learning
    • monitoring / Potential issues in large-scale Machine learning
    • fault tolerance / Potential issues in large-scale Machine learning
    • auto scaling / Potential issues in large-scale Machine learning
    • job scheduling / Potential issues in large-scale Machine learning
    • Workflow Management / Potential issues in large-scale Machine learning
  • practical implementation aspects
    • spam detection / Practical Machine learning examples
    • credit card fraud detection / Practical Machine learning examples
    • digit recognition / Practical Machine learning examples
    • speech recognition / Practical Machine learning examples
    • face detection / Practical Machine learning examples
    • product recommendation / Practical Machine learning examples
    • customer segmentation / Practical Machine learning examples
    • stock trading / Practical Machine learning examples
    • sentiment analysis / Practical Machine learning examples
  • Predictive analytics / Emerging perspectives
  • prior probability
    • about / Types of probability
  • probability
    • about / Probability
    • methods, for determining / Probability
    • types / Types of probability
    • posterior probability / Types of probability
    • prior probability / Types of probability
    • conditional probability / Types of probability
    • joint probability / Types of probability
    • marginal probability / Types of probability
  • Probably Approximately Correct (PAC)
    • about / Performance measures
    • Approximate / Performance measures
    • Probability / Performance measures
  • problem types, Machine learning
    • about / Types of learning problems
    • classification / Classification
    • clustering / Clustering
    • forecasting / Forecasting, prediction or regression
    • prediction / Forecasting, prediction or regression
    • regression / Forecasting, prediction or regression
    • simulation / Simulation
    • optimization / Optimization
    • supervised learning / Supervised learning
    • unsupervised learning / Unsupervised learning
    • semi-supervised learning / Semi-supervised learning
    • reinforcement learning / Reinforcement learning
    • deep learning / Deep learning
  • process lifecycle, Machine learning / Machine learning process lifecycle and solution architecture
  • Producer/Consumer Model
    • about / Distributed and parallel computing strategies
  • Protocol Buffer
    • URL / Approach 2 – Using the Rhipe package of R
  • PyBrain / Implementation of Python (using examples)
  • Pydoop
    • about / Implementation of Python (using examples)
  • PyML / Implementation of Python (using examples)
  • Python
    • about / Python
    • toolkit options / Toolkit options in Python
    • implementing / Implementation of Python (using examples)
    • installing / Installing Python and setting up scikit-learn
  • Python (scikit-learn)
    • used, for implementing decision trees / Using Python (scikit-learn)
    • used, for implementing KNN / Using Python (scikit-learn)
    • used, for implementing Support Vector Machines (SVM) / Using Python (Scikit-learn)
    • used, for implementing Apriori and FP-growth / Using Python (Scikit-learn)
    • used, for implementing k-means clustering / Using Python (scikit-learn)
    • used, for implementing Deep learning methods / Using Python (Scikit-learn)
    • used, for implementing ANNs / Using Python (Scikit-learn)
    • used, for implementing ensemble methods / Using Python (Scikit-learn)

Q

  • Q-Learning technique
    • about / Q-Learning – off-Policy TD
  • Quadratic Discriminant Analysis (QDA) / C4.5
  • quartiles
    • about / Revisiting statistics
  • QUEST / C4.5

R

  • R
    • about / R
    • capabilities / R
    • installing / Installing and setting up R
    • setting up / Installing and setting up R
    • used, for implementing decision trees / Using R
    • used, for implementing KNN / Using R
    • used, for implementing Support Vector Machines (SVM) / Using R
    • used, for implementing Apriori and FP-growth / Using R
    • used, for implementing k-means clustering / Using R
    • used, for implementing Naïve Bayes algorithm / Using R
    • used, for implementing logistic regression / Using R
    • used, for implementing linear regression / Using R
    • used, for implementing Deep learning methods / Using R
    • used, for implementing ANNs / Using R
    • used, for implementing ensemble methods / Using R
  • R, integrating with Apache Hadoop
    • about / Integrating R with Apache Hadoop
    • R and Streaming APIs, using in Hadoop / Approach 1 – Using R and Streaming APIs in Hadoop
    • Rhipe package, using of R / Approach 2 – Using the Rhipe package of R
    • RHadoop, using / Approach 3 – Using RHadoop
  • R / Hadoop integration approaches
    • pros / Summary of R/Hadoop integration approaches
    • cons / Summary of R/Hadoop integration approaches
  • Radial Bias Function (RBF) networks / Radial Bias Function (RBF) networks
  • random access sparse vectors / Implementing vectors in Mahout
  • random forests
    • about / Random forests, Random forests
  • randomness
    • about / Important terms and definitions
  • range
    • about / Revisiting statistics
  • R Data Frames
    • about / R Data Frames
  • RDBMS
    • theoretical limitations / Theoretical limitations of RDBMS
  • rdfs package / Approach 3 – Using RHadoop
  • recommendation systems / Recommendation systems
  • rectified linear neurons / Rectified linear neurons / linear threshold neurons
  • Recurrent Neural Networks (RNNs) / Recurrent Neural Networks (RNNs), Restricted Boltzmann Machines (RBMs)
  • Reducer job / MapReduce architecture
  • reference reward
    • about / Reinforcement comparison methods
  • regression analysis
    • about / Regression analysis
    • statistics, revisiting / Revisiting statistics
  • regression analysis based algorithms
    • about / Regression analysis based algorithms
  • regression methods
    • about / Regression methods
    • key assumptions / Regression methods
    • simple regression / Simple regression or simple linear regression
    • simple linear regression / Simple regression or simple linear regression
    • multiple regression / Multiple regression
    • polynomial (non-linear) regression / Polynomial (non-linear) regression
    • Generalized Linear Models (GLM) / Generalized Linear Models (GLM)
    • logistic regression (logit link) / Logistic regression (logit link)
    • Poisson regression / Poisson regression
  • Reinforcement Comparison methods
    • about / Reinforcement comparison methods
  • Reinforcement Learning (RL)
    • about / Reinforcement Learning (RL)
    • context / The context of Reinforcement Learning
    • terms / The context of Reinforcement Learning
    • examples / Examples of Reinforcement Learning
    • evaluative feedback / Evaluative Feedback
    • Markov Decision Process (MDP) / Markov Decision Process (MDP)
    • Delayed Rewards / Delayed rewards
    • optimal policy / The policy
    • key features / Reinforcement Learning – key features
    • solution methods / Reinforcement learning solution methods
  • Reinforcement Learning (RL) problem
    • world grid example / The Reinforcement Learning problem – the world grid example
  • Remote Procedure Calls (RPC) / Hadoop ecosystem components
  • Resilient Distributed Dataset (RDD) / Apache Spark
  • Resilient Distributed Datasets (RDD)
    • programming with / Programming with Resilient Distributed Datasets (RDD)
  • RESTFul HDFS / RESTFul HDFS
  • reward
    • about / The context of Reinforcement Learning
  • R Expressions
    • about / R Expressions
    • assignments / Assignments
    • functions / Functions
  • R Factors
    • about / R Factors
  • rhbase package / Approach 3 – Using RHadoop
  • R Learning (Off-policy)
    • about / R Learning (Off-policy)
  • R Matrices
    • about / R Matrices
  • rmr package / Approach 3 – Using RHadoop
  • root mean square error (RMSE) / Mean squared error (MSE)
  • Rote Learner / Instance-based learning (IBL)
  • R Statistical frameworks
    • about / R Statistical frameworks
  • rule extraction / Forecasting, prediction or regression
  • R Vectors
    • about / R Vectors
    • assigning / Assigning, accessing, and manipulating vectors
    • accessing / Assigning, accessing, and manipulating vectors
    • manipulating / Assigning, accessing, and manipulating vectors

S

  • 4Store / Vendors
  • sample
    • about / Important terms and definitions
    • stratified sampling / Important terms and definitions
  • sample size
    • about / Important terms and definitions
  • sample space probability
    • about / Probability
  • Sampling Bias
    • about / Important terms and definitions
  • Sarsa
    • about / Sarsa - on-Policy TD
  • Scala
    • about / Scala
    • examples / Scala
  • scaling-out storage
    • versus scaling-up storage / Scaling-up versus Scaling-out storage
  • scikit-learn
    • about / Implementation of Python (using examples)
    • setting up / Installing Python and setting up scikit-learn
    • used, for implementing Naïve Bayes algorithm / Using scikit-learn
    • used, for implementing logistic regression / Using scikit-learn
    • used, for implementing linear regression / Using scikit-learn
  • SciPy
    • about / Implementation of Python (using examples)
  • semantic data architecture
    • about / Semantic data architecture
    • business data lake / The business data lake
    • central data integration / Semantic Web technologies
    • peer-to-peer / Semantic Web technologies
    • features / Ontology and data integration
    • vendors / Vendors
  • Semantic Web technologies
    • about / Semantic Web technologies
    • ontology and data integration / Ontology and data integration
  • semi-supervised learning
    • about / Reinforcement Learning (RL)
  • sequence files
    • about / Implementing vectors in Mahout
  • sequential access sparse vectors / Implementing vectors in Mahout
  • Sesame / Vendors
  • shallow learning algorithm / Background
  • Shared Nothing Architecture (SNA) / The Hadoop (Physical) Infrastructure layer – supporting appliance
  • Sigmoid neurons / Sigmoid neurons
  • simple linear regression
    • about / Simple regression or simple linear regression
  • simple regression
    • about / Simple regression or simple linear regression
  • Single Instruction Multiple Data (SIMD) / Distributed and parallel computing strategies
  • Single Instruction Single Data (SISD) / Distributed and parallel computing strategies
  • singularity / Regression methods
  • skewed data
    • about / Revisiting statistics
  • smart data / The Ingestion layer
  • Softmax regression technique
    • about / Softmax regression technique
  • solution architecture, Machine learning / Machine learning process lifecycle and solution architecture
  • solution methods, Reinforcement Learning (RL)
    • about / Reinforcement learning solution methods
    • Dynamic Programming (DP) / Dynamic Programming (DP)
    • Monte Carlo methods / Monte Carlo methods
    • temporal difference (TD) learning / Temporal difference (TD) learning
    • Q-Learning technique / Q-Learning – off-Policy TD
    • actor-critic methods (on-policy) / Actor-critic methods (on-policy)
    • R Learning (Off-policy) / R Learning (Off-policy)
  • Spark
    • used, for implementing decision trees / Using Spark
    • used, for implementing KNN / Using Spark
    • used, for implementing Support Vector Machines (SVM) / Using Spark
    • used, for implementing Apriori and FP-growth / Using Spark
    • used, for implementing k-means clustering / Using Spark
    • used, for implementing Naïve Bayes algorithm / Using Spark
    • used, for implementing logistic regression / Using Spark
    • used, for implementing linear regression / Using Spark
    • used, for implementing Deep learning methods / Using Spark
    • used, for implementing ANNs / Using Spark
    • used, for implementing ensemble methods / Using Spark
  • Spark SQL / Apache Spark
  • Spark Streaming / Apache Spark
  • sparse vectors
    • about / Implementing vectors in Mahout
    • random access sparse vectors / Implementing vectors in Mahout
    • sequential access sparse vectors / Implementing vectors in Mahout
  • specialized trees
    • about / Specialized trees
    • oblique trees / Oblique trees
    • random forests / Random forests
    • evolutionary trees / Evolutionary trees
    • Hellinger trees / Hellinger trees
  • Spring XD / The Analytics layer, Vendors
    • about / Spring XD
    • features / Spring XD
  • Spring XD architecture, layers
    • about / Spring XD
    • Speed Layer / Spring XD
    • Batch Layer / Spring XD
    • Serving Layer / Spring XD
  • Sqoop / Hadoop platform / Processing layer
    • about / Hadoop ecosystem components
    • URL / Hadoop ecosystem components
  • SSE (Sum Squared Error) / Simple regression or simple linear regression
  • SSL (Secure Socket Layer) / Security and Monitoring layer
  • standard deviation
    • about / Important terms and definitions
  • Stardog / Vendors
  • state
    • about / The context of Reinforcement Learning
  • statistical learning
    • versus Machine learning / Statistical learning
  • statisticians
    • objective / Statistician's thinking
  • stochastic binary neurons / Stochastic binary neurons
  • stratified sampling
    • about / Important terms and definitions
  • stream mining / Stream mining or classification
  • String manipulations, Julia
    • working with / Working with Strings and String manipulations
  • Strings, Julia
    • working with / Working with Strings and String manipulations
  • sum of squared error of prediction (SSE) / Convergence or stopping criteria for the k-means clustering
  • supervised ensemble methods
    • about / Supervised ensemble methods
    • boosting / Boosting
    • bagging / Bagging
    • wagging / Wagging
  • supervised learning
    • about / Reinforcement Learning (RL)
  • Support Vector Machine (SVM) / Implementation of Python (using examples)
  • Support Vector Machines (SVM)
    • about / Support Vector Machines (SVM)
    • Inseparable Data / Inseparable Data
    • implementing / Implementing SVM
    • implementing, Mahout used / Using Mahout
    • implementing, R used / Using R
    • implementing, Spark used / Using Spark
    • implementing, Python (scikit-learn) used / Using Python (Scikit-learn)
    • implementing, Julia used / Using Julia
  • support vector machines (SVM) / Kernel method based algorithms
  • symmetric distribution
    • about / Revisiting statistics
  • synapses / Synapses

T

  • Tableau / Apache Spark
  • Tajo
    • about / Hadoop ecosystem components
    • URL / Hadoop ecosystem components
  • task dependency graph / Developing concurrent algorithms
  • task parallelization
    • about / Distributed and parallel computing strategies
  • TaskTracker / MapReduce architecture
  • Temporal Credit Assignment
    • about / Delayed rewards
  • temporal difference (TD) learning
    • about / Temporal difference (TD) learning
    • Sarsa / Sarsa - on-Policy TD
  • terms, Reinforcement Learning (RL)
    • agent / The context of Reinforcement Learning
    • environment / The context of Reinforcement Learning
    • state / The context of Reinforcement Learning
    • action / The context of Reinforcement Learning
    • policy / The context of Reinforcement Learning
    • reward / The context of Reinforcement Learning
    • value / The context of Reinforcement Learning
  • top-K recommendation / Instance-based learning (IBL)
  • Total Cost of Ownership (TCO) / Commoditizing information
  • Total Lifetime Value (TLV) / Classification
  • Total overall cost of ownership (TCO) / Emerging perspectives
  • traditional ETL architecture
    • limitations / Evolution of data architectures
  • transfer learning / Transfer learning
  • tree Induction method
    • ID3 / C4.5
    • CHAID / C4.5
    • QUEST / C4.5
    • CAL5 / C4.5
    • FACT / C4.5
    • LMDT / C4.5
    • MARS / C4.5

U

  • Ubuntu-based Hadoop Installation
    • prerequisites / Hadoop installation and setup
    • Jdk 1.7, installing / Installing Jdk 1.7
    • system user, creating for Hadoop / Creating a system user for Hadoop (dedicated)
    • IPv6, disabling / Disable IPv6
  • uncertainty
    • sources / Probability
  • Unique Transaction Identifier (UTI) / Association rule – a definition
  • unlabelled data set
    • about / Reinforcement Learning (RL)
  • unsupervised ensemble methods
    • about / Unsupervised ensemble methods

V

  • value
    • about / The context of Reinforcement Learning
  • variable
    • about / Important terms and definitions
  • variables, Julia / Using variables and assignments
  • variance / Solving the errors: bias and variance
    • about / Revisiting statistics
    • properties / Properties of variance
  • vectors
    • implementing, in Mahout / Implementing vectors in Mahout
  • Visualizations
    • about / The Consumption layer
    • data, exploring with / Explaining and exploring data with Visualizations
  • Voronoi cell / Nearest Neighbors

W

  • wagging
    • about / Wagging
  • WebHDFS REST API
    • URL / RESTFul HDFS
  • Wisdom of Crowds
    • about / The wisdom of the crowd
    • aggregation / The wisdom of the crowd
    • independence / The wisdom of the crowd
    • decentralization / The wisdom of the crowd
    • diversity of opinion / The wisdom of the crowd
    • usage of combiner / The wisdom of the crowd
    • dependency between classifiers / The wisdom of the crowd
    • diversity, generating / The wisdom of the crowd
    • size of ensemble / The wisdom of the crowd
    • cross inducers / The wisdom of the crowd

Y

  • YARN
    • about / Hadoop ecosystem components

Z

  • ZooKeeper / Hadoop platform / Processing layer
    • about / Hadoop ecosystem components
    • URL / Hadoop ecosystem components