Index
A
- abstraction / Abstraction
- activation function / From biological to artificial neurons
- about / Activation functions
- threshold activation function / Activation functions
- unit step activation function / Activation functions
- sigmoid activation function / Activation functions
- AdaBoost
- about / Boosting
- AdaBoost.M1 algorithm / Boosting
- adaptive boosting
- allocation function / Understanding ensembles
- Apache Hadoop
- Application Programming Interfaces (APIs)
- about / Parsing JSON from web APIs
- Apriori
- Apriori algorithm
- for association rule learning / The Apriori algorithm for association rule learning
- strengths / The Apriori algorithm for association rule learning
- Apriori principle
- used, for building set of rules / Building a set of rules with the Apriori principle
- Artificial Neural Network (ANN)
- about / Understanding neural networks
- association rules
- about / Understanding association rules
- potential applications / Understanding association rules
- rule interest, measuring / Measuring rule interest – support and confidence
- set of rules, building with Apriori principle / Building a set of rules with the Apriori principle
- frequently purchased groceries, identifying with / Example – identifying frequently purchased groceries with association rules
- automated parameter tuning
- caret package used for / Using caret for automated parameter tuning
- requisites / Using caret for automated parameter tuning
- axon
B
- backpropagation
- neural networks, training with / Training neural networks with backpropagation
- about / Training neural networks with backpropagation
- bag-of-words / Step 2 – exploring and preparing the data
- bagging
- about / Bagging
- bank loans example, with C5.0 decision trees
- data, collecting / Step 1 – collecting data
- data, exploring / Step 2 – exploring and preparing the data
- data, preparing / Step 2 – exploring and preparing the data
- random training, creating / Data preparation – creating random training and test datasets
- test datasets, creating / Data preparation – creating random training and test datasets
- model, training on data / Step 3 – training a model on the data
- model performance, evaluating / Step 4 – evaluating model performance
- model performance, improving / Step 5 – improving model performance
- Bayesian methods
- basic concepts / Basic concepts of Bayesian methods
- Bayesian methods,basics concepts
- joint probability / Understanding joint probability
- conditional probability / Computing conditional probability with Bayes' theorem
- Bayesian methods, basics concepts
- probability / Understanding probability
- Beowulf cluster
- betweenness centrality
- bias / The case of linearly separable data
- bias-variance tradeoff / Choosing an appropriate k
- biglm package
- regression models, building / Building bigger regression models with biglm
- bigmemory package
- massive matrices, using with / Using massive matrices with bigmemory
- URL / Using massive matrices with bigmemory
- bigrf package
- random forests, building / Growing bigger and faster random forests with bigrf
- URL / Growing bigger and faster random forests with bigrf
- bimodal / Measuring the central tendency – the mode
- binning
- bins
- Bioconductor
- bioinformatics
- about / Analyzing bioinformatics data
- bioinformatics data
- analyzing / Analyzing bioinformatics data
- bivariate relationships
- blind tasting experience example / The k-NN algorithm
- blowby / Simple linear regression
- body mass index (BMI) / Step 1 – collecting data
- boosting
- about / Boosting
- bootstrap aggregating
- about / Bagging
- bootstrap sampling / Bootstrap sampling
- box-and-whiskers plot / Visualizing numeric variables – boxplots
- branches
- about / Understanding decision trees
- breast cancer
- diagnosing, with k-NN algorithm / Example – diagnosing breast cancer with the k-NN algorithm
- breast cancer example
- data, collecting / Step 1 – collecting data
- data, exploring / Step 2 – exploring and preparing the data
- data, preparing / Step 2 – exploring and preparing the data
- model, training on data / Step 3 – training a model on the data
- model performance, evaluating / Step 4 – evaluating model performance
- model performance, improving / Step 5 – improving model performance
C
- C5.0 algorithm
- about / The C5.0 decision tree algorithm
- split, selecting / Choosing the best split
- decision tree, pruning / Pruning the decision tree
- caret package
- using, for automated parameter tuning / Using caret for automated parameter tuning
- URL / Using caret for automated parameter tuning, Training and evaluating models in parallel with caret
- used, for evaluating models in parallel / Training and evaluating models in parallel with caret
- categorical / Types of input data
- categorical variables
- about / Exploring categorical variables
- central tendency, measuring / Measuring the central tendency – the mode
- cell body / From biological to artificial neurons
- centroid / Using distance to assign and update clusters
- characteristics, neural networks
- activation function / From biological to artificial neurons
- network topology / From biological to artificial neurons
- training algorithm / From biological to artificial neurons
- classification / Types of machine learning algorithms
- classification and regression training (caret package) / Beyond accuracy – other measures of performance
- Classification and Regression Tree (CART) algorithm / Understanding regression trees and model trees
- classification performance
- measuring / Measuring performance for classification
- classification prediction data-classification prediction data
- working with / Working with classification prediction data in R
- classification rules
- about / Understanding classification rules
- separate and conquer / Separate and conquer
- 1 R algorithm / The 1R algorithm
- RIPPER algorithm / The RIPPER algorithm
- obtaining, from decision trees / Rules from decision trees
- class imbalance problem / Measuring performance for classification
- clustering / Types of machine learning algorithms
- about / Understanding clustering
- as machine learning task / Clustering as a machine learning task
- clustering, k-means clustering algorithm
- about / The k-means clustering algorithm
- distance, used for assigning cluster / Using distance to assign and update clusters
- distance, used for updating cluster / Using distance to assign and update clusters
- appropriate number of clusters, selecting / Choosing the appropriate number of clusters
- column-major order / Matrixes and arrays
- combination function / Understanding ensembles
- Complete Unified Device Architecture (CUDA)
- about / GPU computing
- Comprehensive R Archive Network (CRAN)
- about / Machine learning with R
- URL / Machine learning with R
- concrete strength, modeling with ANNs
- about / Example – Modeling the strength of concrete with ANNs
- data, collecting / Step 1 – collecting data
- data, preparing / Step 2 – exploring and preparing the data
- data, exploring / Step 2 – exploring and preparing the data
- model, training on data / Step 3 – training a model on the data
- model performance, evaluating / Step 4 – evaluating model performance
- model performance, improving / Step 5 – improving model performance
- conditional probability
- confusion matrix
- about / A closer look at confusion matrices
- used, for measuring performance / Using confusion matrices to measure performance
- control object / Customizing the tuning process
- convex hull / The case of linearly separable data
- corpus / Data preparation – cleaning and standardizing text data
- correlation
- about / Correlations
- CRAN
- CRAN task view
- CRAN Web Technologies
- cross-validation / Cross-validation
- CSV (Comma-Separated Values) file
- CSV files
- data, importing from / Importing and saving data from CSV files
- curl utility
- cut points
D
- data
- managing, with R / Managing data with R
- importing, from CSV files / Importing and saving data from CSV files
- data.table package
- Database Management Systems (DBMSs)
- about / Querying data in SQL databases
- databases
- about / Working with proprietary files and databases
- data, querying in SQL databases / Querying data in SQL databases
- data dictionary
- about / Exploring the structure of data
- data exploration
- about / Exploring and understanding data
- data frame
- about / Data frames
- data mining
- about / The origins of machine learning
- data munging
- data preparation, breast cancer example
- training, creating / Data preparation – creating training and test datasets
- test datasets, creating / Data preparation – creating training and test datasets
- Data Source Name (DSN)
- about / Querying data in SQL databases
- data storage / Data storage
- data structures, R
- about / R data structures
- vector / Vectors
- factor / Factors
- lists / Lists
- data frame / Data frames
- matrix / Matrixes and arrays
- array / Matrixes and arrays
- saving / Saving, loading, and removing R data structures
- loading / Saving, loading, and removing R data structures
- removing / Saving, loading, and removing R data structures
- exploring / Exploring the structure of data
- data table
- data wrangling
- decision nodes
- about / Understanding decision trees
- decision tree
- potential uses / Understanding decision trees
- about / Understanding decision trees, Example – identifying risky bank loans using C5.0 decision trees
- divide and conquer / Divide and conquer
- pruning / Pruning the decision tree
- used, for identifying risky bank loans / Example – identifying risky bank loans using C5.0 decision trees
- accuracy, boosting / Boosting the accuracy of decision trees
- decision tree forests
- about / Random forests
- decision trees
- classification rules, obtaining from / Rules from decision trees
- deep learning
- Deep Neural Network (DNN)
- delimiter
- dendrites
- dependent events / Understanding joint probability
- dependent variable
- about / Understanding regression
- descriptive model / Types of machine learning algorithms
- disk-based data frames
- creating, with ff package / Creating disk-based data frames with ff
- divide and conquer
- about / Divide and conquer
- domain-specific data
- working with / Working with domain-specific data
- bioinformatics data, analyzing / Analyzing bioinformatics data
- network data, analyzing / Analyzing and visualizing network data
- network data, visualizing / Analyzing and visualizing network data
- doParallel package
- dplyr package
- used, for generalizing tabular data structures / Generalizing tabular data structures with dplyr
- URL / Generalizing tabular data structures with dplyr
- dummy coding / Preparing data for use with k-NN, Step 3 – training a model on the data
- dummy variable / Examining relationships – two-way cross-tabulations, Step 3 – training a model on the data
E
- early stopping
- about / Pruning the decision tree
- edgelist
- elements
- about / Vectors
- embarrassingly parallel problems
- ensemble methods
- bagging / Bagging
- boosting / Boosting
- random forests / Random forests
- ensembles
- about / Understanding ensembles
- advantages / Understanding ensembles
- entropy
- about / Choosing the best split
- epoch
- about / Training neural networks with backpropagation
- forward phase / Training neural networks with backpropagation
- backward phase / Training neural networks with backpropagation
- erosion / Simple linear regression
- Euclidean norm / The case of linearly separable data
- evaluation / Evaluation
F
- 10-fold cross-validation (10-fold CV) / Cross-validation
- F-measure / The F-measure
- F-score / The F-measure
- F1 score / The F-measure
- factor
- about / Factors
- feedforward networks
- ffbase project
- ff package
- used, for creating disk-based data frames / Creating disk-based data frames with ff
- URL / Creating disk-based data frames with ff
- five-number summary / Measuring spread – quartiles and the five-number summary
- foreach package
- frequently purchased groceries
- identifying, with association rules / Example – identifying frequently purchased groceries with association rules
- future performance
- estimating / Estimating future performance
- future performance estimation
- holdout method / The holdout method
- cross-validation / Cross-validation
- bootstrap sampling / Bootstrap sampling
G
- Gaussian RBF kernel / Using kernels for non-linear spaces
- generalization / Generalization
- Generalized Linear Models (GLM) / Understanding regression
- glyph / Step 1 – collecting data
- GPU
- about / GPU computing
- computing / GPU computing
- URL / GPU computing
- gradient descent / Training neural networks with backpropagation
- Graph Modeling Language (GML)
- greedy learners / What makes trees and rules greedy?
- grid
H
- Hadoop
- harmonic mean / The F-measure
- header line
- histograms / Visualizing numeric variables – histograms
- holdout method / The holdout method, Cross-validation
- httr package
- hyperplane / Understanding Support Vector Machines
- Hypertext Markup Language (HTML)
I
- igraph package
- imputation / Data preparation – imputing the missing values
- Incremental Reduced Error Pruning (IREP) algorithm / The RIPPER algorithm
- independent events / Understanding joint probability
- independent variables
- about / Understanding regression
- information gain / Choosing the best split
- input data
- types / Types of input data
- matching, to algorithms / Matching input data to algorithms
- input nodes / The number of layers
- instance-based learning
- about / Why is the k-NN algorithm lazy?
- intercept
- about / Understanding regression
- Interquartile Range (IQR) / Measuring spread – quartiles and the five-number summary
- itemset
- about / Understanding association rules
- Iterative Dichotomiser 3 (ID3) / The C5.0 decision tree algorithm
J
- joint probability / Understanding joint probability
- JSON
- parsing, from web APIs / Parsing JSON from web APIs
- about / Parsing JSON from web APIs
- URL / Parsing JSON from web APIs
- jsonlite package
K
- k-fold cross-validation (or k-fold CV) / Cross-validation
- k-means++ / Using distance to assign and update clusters
- k-means clustering algorithm
- about / The k-means clustering algorithm
- k-NN algorithm
- about / The k-NN algorithm
- weaknesses / The k-NN algorithm
- similarity, measuring with distance / Measuring similarity with distance
- appropriate k, selecting / Choosing an appropriate k
- data, preparing / Preparing data for use with k-NN
- lazy learning algorithm / Why is the k-NN algorithm lazy?
- used, for diagnosing breast cancer / Example – diagnosing breast cancer with the k-NN algorithm
- kernels
- using, for non-linear spaces / Using kernels for non-linear spaces
- kernel trick / Using kernels for non-linear spaces
- kernlab
- reference / Step 3 – training a model on the data
L
- Laplace estimator
- about / The Laplace estimator
- large datasets
- managing / Managing very large datasets
- tabular data structures, generalizing with dplyr / Generalizing tabular data structures with dplyr
- data.table package, using / Making data frames faster with data.table
- disk-based data frames, creating with ff package / Creating disk-based data frames with ff
- massive matrices, using with bigmemory package / Using massive matrices with bigmemory
- latitude / Using kernels for non-linear spaces
- layers
- about / The number of layers
- lazy learning algorithms / Why is the k-NN algorithm lazy?
- leaf nodes
- about / Understanding decision trees
- learning rate / Training neural networks with backpropagation
- leave-one-out method / Cross-validation
- left-hand side (LHS) / Understanding association rules
- levels / Types of machine learning algorithms
- LIBSVM
- likelihood
- linear kernel / Using kernels for non-linear spaces
- link function / Understanding regression
- lists / Lists
- loess curve / Visualizing relationships among features – the scatterplot matrix
- logistic regression
- about / Understanding regression
- longitude / Using kernels for non-linear spaces
M
- machine learning
- origins / The origins of machine learning
- about / The origins of machine learning
- abuses / Uses and abuses of machine learning
- uses / Uses and abuses of machine learning
- successes / Machine learning successes
- limitations / The limits of machine learning
- ethics / Machine learning ethics
- process / How machines learn
- with R / Machine learning with R
- R packages, installing / Installing R packages
- R packages, loading / Loading and unloading R packages
- R packages, unloading / Loading and unloading R packages
- machine learning, in practice
- about / Machine learning in practice
- data collection / Machine learning in practice
- data exploration and preparation / Machine learning in practice
- model training / Machine learning in practice
- model evaluation / Machine learning in practice
- model improvement / Machine learning in practice
- input data, types / Types of input data
- algorithms, types / Types of machine learning algorithms
- input data, matching to algorithms / Matching input data to algorithms
- machine learning, process
- about / How machines learn
- data storage / How machines learn, Data storage
- abstraction / How machines learn, Abstraction
- generalization / How machines learn, Generalization
- evaluation / How machines learn, Evaluation
- machine learning algorithms
- magrittr package
- about / Scraping data from web pages
- URL / Scraping data from web pages
- MapReduce
- marginal likelihood
- market basket analysis example
- data, collecting / Step 1 – collecting data
- data, preparing / Step 2 – exploring and preparing the data
- data, exploring / Step 2 – exploring and preparing the data
- sparse matrix, creating for transaction data / Data preparation – creating a sparse matrix for transaction data
- item support, visualizing / Visualizing item support – item frequency plots
- transaction data, visualizing / Visualizing the transaction data – plotting the sparse matrix
- model, training on data / Step 3 – training a model on the data
- model performance, evaluating / Step 4 – evaluating model performance
- model performance, improving / Step 5 – improving model performance
- set of association rules, sorting / Sorting the set of association rules
- subset of association rules, sorting / Taking subsets of association rules
- association rules, saving to file / Saving association rules to a file or data frame
- association rules, saving to data frame / Saving association rules to a file or data frame
- matrix
- about / Matrixes and arrays
- matrix notation / Multiple linear regression
- maximum margin hyperplane (MMH) / Classification with hyperplanes
- mean / Measuring the central tendency – mean and median
- mean absolute error (MAE) / Measuring performance with the mean absolute error
- medical expenses, predicting with linear regression
- about / Example – predicting medical expenses using linear regression
- data, collecting / Step 1 – collecting data
- data, preparing / Step 2 – exploring and preparing the data
- data, exploring / Step 2 – exploring and preparing the data
- correlation matrix / Exploring relationships among features – the correlation matrix
- relationships, visualizing among features / Visualizing relationships among features – the scatterplot matrix
- scatterplot matrix / Visualizing relationships among features – the scatterplot matrix
- model, training on data / Step 3 – training a model on the data
- model performance, training / Step 4 – evaluating model performance
- model performance, improving / Step 5 – improving model performance, Model specification – adding non-linear relationships, Transformation – converting a numeric variable to a binary indicator, Model specification – adding interaction effects, Putting it all together – an improved regression model
- message-passing interface (MPI)
- meta-learners / Types of machine learning algorithms
- meta-learning methods
- used, for improving model performance / Improving model performance with meta-learning
- about / Improving model performance with meta-learning
- min-max normalization / Preparing data for use with k-NN
- mobile phone spam
- filtering, with Naive Bayes algorithm / Example – filtering mobile phone spam with the Naive Bayes algorithm
- mobile phone spam example
- data, collecting / Step 1 – collecting data
- dat a collecting, URL / Step 1 – collecting data
- data, preparing / Step 2 – exploring and preparing the data
- data, exploring / Step 2 – exploring and preparing the data
- text data, cleaning / Data preparation – cleaning and standardizing text data
- text data, standardizing / Data preparation – cleaning and standardizing text data
- text documents, splitting into words / Data preparation – splitting text documents into words
- training, creating / Data preparation – creating training and test datasets
- test datasets, creating / Data preparation – creating training and test datasets
- text data, visualizing / Visualizing text data – word clouds
- indicator features, creating for frequent words / Data preparation – creating indicator features for frequent words
- model, training on data / Step 3 – training a model on the data
- model performance, evaluating / Step 4 – evaluating model performance
- model performance, improving / Step 5 – improving model performance
- model performance
- improving, with meta-learning / Improving model performance with meta-learning
- model performance, breast cancer example
- z-score standardization / Transformation – z-score standardization
- alternatives values, testing of k / Testing alternative values of k
- model trees / Understanding regression trees and model trees
- multicore package
- multilayer network
- about / The number of layers
- Multilayer Perceptron (MLP)
- multimodal / Measuring the central tendency – the mode
- multinomial logistic regression / Understanding regression
- multiple linear regression / Understanding regression
- about / Multiple linear regression
- weaknesses / Multiple linear regression
- multiple R-squared value (coefficient of determination) / Step 4 – evaluating model performance
- multivariate relationships
N
- Naive Bayes algorithm
- about / Understanding Naive Bayes, The Naive Bayes algorithm
- classification / Classification with Naive Bayes
- Laplace estimator / The Laplace estimator
- numeric features, using with / Using numeric features with Naive Bayes
- used, for filtering mobile phone spam / Example – filtering mobile phone spam with the Naive Bayes algorithm
- nearest neighbor classification
- network analysis
- network data
- analyzing / Analyzing and visualizing network data
- visualizing / Analyzing and visualizing network data
- network topology
- about / Network topology
- layers / The number of layers
- direction of information travel / The direction of information travel
- number of nodes in each layer / The number of nodes in each layer
- neural networks
- about / Understanding neural networks
- biological, to artificial neurons / From biological to artificial neurons
- characteristics / From biological to artificial neurons
- training, with backpropagation / Training neural networks with backpropagation
- neurons
- about / Understanding neural networks
- nodes / Understanding neural networks
- nominal / Types of input data
- nominal variables
- about / Factors
- non-linear spaces
- kernels, using for / Using kernels for non-linear spaces
- normal distribution / Understanding numeric data – uniform and normal distributions
- numeric / Types of input data
- numeric data
- numeric features
- using, with Naive Bayes / Using numeric features with Naive Bayes
- numeric prediction / Types of machine learning algorithms
- numeric variables
- about / Exploring numeric variables
- central tendency, measuring / Measuring the central tendency – mean and median
- spread, measuring / Measuring spread – quartiles and the five-number summary, Measuring spread – variance and standard deviation
- visualizing / Visualizing numeric variables – boxplots, Visualizing numeric variables – histograms
O
- OCR, performing with SVMs
- about / Example – performing OCR with SVMs
- data, collecting / Step 1 – collecting data
- data, exploring / Step 2 – exploring and preparing the data
- data, preparing / Step 2 – exploring and preparing the data
- model, training on data / Step 3 – training a model on the data
- model performance, evaluating / Step 4 – evaluating model performance
- model performance, improving / Step 5 – improving model performance
- one-way table / Exploring categorical variables
- online data
- working with / Working with online data and services
- parsing / Working with online data and services
- complete text of web pages, downloading / Downloading the complete text of web pages
- parsing, within web pages / Scraping data from web pages
- online services
- working with / Working with online data and services
- Open Database Connectivity (ODBC)
- about / Querying data in SQL databases
- optimized learning algorithms
- deploying / Deploying optimized learning algorithms
- regression models, building with biglm package / Building bigger regression models with biglm
- random forests, building with bigrf package / Growing bigger and faster random forests with bigrf
- models in parallel, evaluating with caret package / Training and evaluating models in parallel with caret
- ordinal / Types of input data
- ordinary least squares estimation
- out-of-bag error rate / Training random forests
- overfitting / Evaluation
P
- parallel cloud computing
- with MapReduce / Parallel cloud computing with MapReduce and Hadoop
- with Hadoop / Parallel cloud computing with MapReduce and Hadoop
- parallel computing
- about / Learning faster with parallel computing
- execution time, measuring / Measuring execution time
- with multicore package / Working in parallel with multicore and snow
- with snow package / Working in parallel with multicore and snow
- with foreach package / Taking advantage of parallel with foreach and doParallel
- with doParallel package / Taking advantage of parallel with foreach and doParallel
- parameter tuning
- pattern discovery / Types of machine learning algorithms
- Pearson's correlation coefficient / Correlations
- performance
- measuring, confusion matrices used / Using confusion matrices to measure performance
- performance measures
- about / Beyond accuracy – other measures of performance
- kappa statistic / The kappa statistic
- sensitivity / Sensitivity and specificity
- specificity / Sensitivity and specificity
- precision / Precision and recall
- performance tradeoffs
- -visualizing / Visualizing performance trade-offs
- poisonous mushrooms
- identifying, with rule learners / Example – identifying poisonous mushrooms with rule learners
- poisonous mushrooms example, with rule learners
- data, collecting / Step 1 – collecting data
- data, exploring / Step 2 – exploring and preparing the data
- data, preparing / Step 2 – exploring and preparing the data
- model, training on data / Step 3 – training a model on the data
- model performance, evaluating / Step 4 – evaluating model performance
- model performance, improving / Step 5 – improving model performance
- Poisson regression
- about / Understanding regression
- polynomial kernel / Using kernels for non-linear spaces
- positive predictive value / Precision and recall
- posterior probability
- postpruning
- about / Pruning the decision tree
- pre-pruning
- about / Pruning the decision tree
- precision / Precision and recall
- predictive model / Types of machine learning algorithms
- prior probability
- probability
- about / Understanding probability
- proprietary files
- about / Working with proprietary files and databases
- Microsoft Excel files, reading / Reading from and writing to Microsoft Excel, SAS, SPSS, and Stata files
- Microsoft Excel files, writing / Reading from and writing to Microsoft Excel, SAS, SPSS, and Stata files
- SAS files, writing / Reading from and writing to Microsoft Excel, SAS, SPSS, and Stata files
- SAS files, reading / Reading from and writing to Microsoft Excel, SAS, SPSS, and Stata files
- SPSS files, reading / Reading from and writing to Microsoft Excel, SAS, SPSS, and Stata files
- SPSS files, writing / Reading from and writing to Microsoft Excel, SAS, SPSS, and Stata files
- Stata files, writing / Reading from and writing to Microsoft Excel, SAS, SPSS, and Stata files
- Stata files, reading / Reading from and writing to Microsoft Excel, SAS, SPSS, and Stata files
- proprietary microarray
- using / Analyzing bioinformatics data
- pure / Choosing the best split
- purity / Choosing the best split
Q
- quadratic optimization / The case of linearly separable data
- quantiles / Measuring spread – quartiles and the five-number summary
R
- 1 R algorithm / The 1R algorithm
- R
- about / Machine learning with R
- packages, installing / Installing R packages
- packages, loading / Loading and unloading R packages
- packages, unloading / Loading and unloading R packages
- data structures / R data structures
- used, for managing data / Managing data with R
- working with classification prediction data / Working with classification prediction data in R
- R, performance improvement
- about / Improving the performance of R
- large datasets, managing / Managing very large datasets
- parallel computing / Learning faster with parallel computing
- GPU, computing / GPU computing
- optimized learning algorithms, deploying / Deploying optimized learning algorithms
- R-squared value / Step 4 – evaluating model performance
- Radial Basis Function (RBF) network
- about / Activation functions
- random forests
- about / Random forests
- URL / Random forests
- strengths / Random forests
- training / Training random forests
- performance, evaluating / Evaluating random forest performance
- building, with bigrf package / Growing bigger and faster random forests with bigrf
- RCurl
- rea under the ROC curve (AUC) / ROC curves
- Receiver Operating Characteristic (ROC) curve
- about / ROC curves
- creating / ROC curves
- recurrent network
- recursive partitioning
- about / Divide and conquer
- regression
- about / Understanding regression
- simple linear regression / Simple linear regression
- ordinary least squares estimation / Ordinary least squares estimation
- correlation / Correlations
- multiple linear regression / Multiple linear regression
- adding, to trees / Adding regression to trees
- regression analysis
- use cases / Understanding regression
- regression equations
- about / Understanding regression
- regression models
- building, with biglm package / Building bigger regression models with biglm
- regression trees
- relationships
- exploring, between variables / Exploring relationships between variables
- visualizing / Visualizing relationships – scatterplots
- examining / Examining relationships – two-way cross-tabulations
- Repeated Incremental Pruning to Produce Error Reduction (RIPPER) algorithm / The RIPPER algorithm
- residuals / Ordinary least squares estimation
- resubstitution error / Estimating future performance
- Revolution Analytics
- RHadoop
- RHIPE package
- rio package
- RIPPER algorithm
- about / The RIPPER algorithm
- risky bank loans
- identifying, C5.0 decision trees used / Example – identifying risky bank loans using C5.0 decision trees
- rote learning
- about / Why is the k-NN algorithm lazy?
- rpart.plot
- rudimentary ANNs / Understanding neural networks
- rvest package
- about / Scraping data from web pages
S
- scatterplot
- scatterplot matrix (SPLOM) / Visualizing relationships among features – the scatterplot matrix
- Scoville scale / Preparing data for use with k-NN
- segmentation analysis / Types of machine learning algorithms
- semi-supervised learning / Clustering as a machine learning task
- separate and conquer
- about / Separate and conquer
- sigmoid kernel / Using kernels for non-linear spaces
- simple linear regression / Understanding regression
- about / Simple linear regression
- simple tuned model
- creating / Creating a simple tuned model
- slack variable / The case of nonlinearly separable data
- slope
- about / Understanding regression
- slope-intercept form
- about / Understanding regression
- SMS Spam Collection
- URL / Step 1 – collecting data
- snowball
- snow package
- social networking service (SNS) / Example – finding teen market segments using k-means clustering
- sparse matrix / Data preparation – splitting text documents into words, Data preparation – creating a sparse matrix for transaction data
- SQL databases
- data, querying in / Querying data in SQL databases
- squashing functions / Activation functions
- stacking
- about / Understanding ensembles
- standard deviation
- standard deviation reduction (SDR) / Adding regression to trees
- statistical hypothesis testing / Understanding regression
- stock models
- tuning, for better performance / Tuning stock models for better performance
- Structured Query Language (SQL)
- about / Querying data in SQL databases
- subtree raising / Pruning the decision tree
- subtree replacement / Pruning the decision tree
- summary statistics / Exploring numeric variables
- supervised learning / Types of machine learning algorithms
- Support Vector Machine (SVM)
- about / Understanding Support Vector Machines
- applications / Understanding Support Vector Machines
- classifications, with hyperplanes / Classification with hyperplanes
- case of linearly separable data / The case of linearly separable data
- case of nonlinearly separable data / The case of nonlinearly separable data
- OCR, performing with / Example – performing OCR with SVMs
- support vectors / Classification with hyperplanes
- SVMlight
- synapse
T
- Tab-Separated Value (TSV)
- tabular
- tabular data structures
- generalizing, with dplyr package / Generalizing tabular data structures with dplyr
- teen market segments search, with k-means clustering
- about / Example – finding teen market segments using k-means clustering
- data, collecting / Step 1 – collecting data
- data, exploring / Step 2 – exploring and preparing the data
- data, preparing / Step 2 – exploring and preparing the data, Data preparation – dummy coding missing values, Data preparation – imputing the missing values
- model, training on data / Step 3 – training a model on the data
- model performance, evaluating / Step 4 – evaluating model performance
- model performance, improving / Step 5 – improving model performance
- terminal nodes / Understanding decision trees
- threshold activation function / Activation functions
- training / Abstraction
- trees
- regression, adding to / Adding regression to trees
- tree structure
- about / Understanding decision trees
- tuning process
- customizing / Customizing the tuning process
- two-way cross-tabulation
U
- UCI Machine Learning Data Repository
- unimodal / Measuring the central tendency – the mode
- unit of analysis / Types of input data
- unit of observation / Types of input data
- unit step activation function / Activation functions
- univariate statistics
- universal function approximator / The number of nodes in each layer
- unsupervised learning / Types of machine learning algorithms
V
- vector
- about / Vectors
- vector types
- types / Vectors
- Voronoi diagram / Using distance to assign and update clusters
W
- web pages
- complete text, downloading / Downloading the complete text of web pages
- data, parsing / Scraping data from web pages
- XML documents, parsing / Parsing XML documents
- JSON, parsing from web APIs / Parsing JSON from web APIs
- web scraping
- about / Scraping data from web pages
- wine quality estimation, with regression trees
- about / Example – estimating the quality of wines with regression trees and model trees
- data, collecting / Step 1 – collecting data
- data, preparing / Step 2 – exploring and preparing the data
- data, exploring / Step 2 – exploring and preparing the data
- model, training on data / Step 3 – training a model on the data
- decision trees, visualizing / Visualizing decision trees
- model performance, evaluating / Step 4 – evaluating model performance
- performance, measuring with mean absolute error / Measuring performance with the mean absolute error
- model performance, improving / Step 5 – improving model performance
- word cloud
- wordcloud package
X
- xml2 GitHub
- URL / Parsing XML documents
- XML documents
- parsing / Parsing XML documents
- XML package
- about / Parsing XML documents
- URL / Parsing XML documents
Z
- z-score / Preparing data for use with k-NN
- z-score standardization / Preparing data for use with k-NN, Transformation – z-score standardization
- ZeroR / The 1R algorithm