Index
A
- agglomerative clustering
- inner working / The inner working of agglomerative clustering
- hclust(), using / Agglomerative clustering with hclust(), Exploring the results of votes in Switzerland
- vote results, exploring / Exploring the results of votes in Switzerland
- analyses
- performing, in R / Performing the analyses in R
- analyses, performing
- C4.5, used for classification / Classification with C4.5
- C50 / C50
- CART / CART
- predictions on testing set, examining / Examining the predictions on the testing set
- conditional inference trees, in R / Conditional inference trees in R
- anova() function / Random intercepts and fixed slopes
- Apriori
- support-based pruning / Generating itemsets with support-based pruning
- confidence-based pruning / Generating rules by using confidence-based pruning
- apriori
- about / Apriori – basic concepts
- association rule / Association rules
- itemsets / Itemsets
- itemset, support / Support
- confidence / Confidence
- lift / Lift
- inner working / The inner working of apriori
- support-based pruning / Generating itemsets with support-based pruning
- used, for analyzing data in R / Analyzing data with apriori in R
- used, for basic analysis / Using apriori for basic analysis
- detailed analysis / Detailed analysis with apriori
- arithmetic mean / Covariance and correlation
- association rule / Association rules
- attribute
- about / A quick look at the Misc menu
B
- bar plots
- example / Histograms and bar plots
- between sum of squares (BSS) / Understanding the data with the all.us.city.crime.1970 dataset
- Big Five Inventory
- binary attributes
- hierarchical clustering, using on / The use of hierarchical clustering on binary attributes
- binomial distribution / The binomial distribution
- bootstrapping
- about / Bootstrapping
- boxplots
- example / Boxplots
C
- C4.5
- about / C4.5
- gain ratio / The gain ratio
- post-pruning / Post-pruning
- installing / Installing C4.5
- C5.0
- about / C5.0
- installing / Installing C5.0
- C50 / C50
- caret / Caret – a unified framework for classification
- caret package
- used, for bootstrapping of predictive models / Cross-validation and bootstrapping of predictive models using the caret package, Bootstrapping, Performing bootstrapping in R with caret
- used, for cross-validation of predictive models / Cross-validation, Performing cross-validation in R with caret
- CART
- about / CART, CART
- installing / Installing CART
- pruning / Pruning
- random forests, in R / Random forests in R
- cbind() function / Computing the closest cluster for each case
- Class attribute / Understanding decision trees
- classification, with C4.5
- about / Classification with C4.5
- unpruned tree / The unpruned tree
- pruned tree / The pruned tree
- classification performance
- computing / Computing the performance of classification
- coef() function / The null model
- conditional inference trees
- about / Conditional inference trees and forests
- installing / Installing conditional inference trees
- in R / Conditional inference trees in R
- confidence-based pruning
- used, for generating rules / Generating rules by using confidence-based pruning
- confint.merMod() function / Random intercepts and fixed slopes
- cor() function / Pearson's correlation
- cor.test() function / Pearson's correlation
- corpus
- loading / Loading the corpus
- processing / Preprocessing and inspecting the corpus
- inspecting / Preprocessing and inspecting the corpus
- corpus.update() function / Collecting news articles in R from the New York Times article search API
- correlation
- about / Covariance and correlation, Correlation
- Pearsons correlation / Pearson's correlation
- Spearmans correlation / Spearman's correlation
- covariance
- about / Covariance and correlation, Covariance
- formula / Covariance
- createDataPartition() function / Loading and preparing the data
- ctree() function / Installing conditional inference trees
- cutoff parameter / Random forests in R
D
- data
- preparing / Loading and preparing the data
- loading / Loading and preparing the data
- data analysis
- correlation / Analyzing data in R: correlation and regression
- regression / Analyzing data in R: correlation and regression
- about / Analyzing data in R: correlation and regression
- steps / First steps in the data analysis
- regression, performing / Performing the regression
- normality of residuals, checking / Checking for the normality of residuals
- variance inflation, checking / Checking for variance inflation
- potential mediations, examining / Examining potential mediations and comparing models
- models, comparing / Examining potential mediations and comparing models
- new data, predicting / Predicting new data
- data frames
- data preparation
- about / Data preparation
- corpus, preprocessing / Preprocessing and inspecting the corpus
- attributes, computing / Computing new attributes
- decision trees
- about / Understanding decision trees
- detailed analysis, with apriori
- about / Detailed analysis with apriori
- data, preparing / Preparing the data
- data, analyzing / Analyzing the data
- association rules, coercing to data frame / Coercing association rules to a data frame
- association rules, visualizing / Visualizing association rules
- developer key
- discretize() function / Preparing the data
- dist() function / Chapter 5 – Agglomerative Clustering Using hclust()
- dist() function function / Distance measures
- distance measures
- using / Distance measures
- dotplot() function / Dotplots
E
- eigen() function / The inner working of Principal Component Analysis
- entropy / Entropy
- exercises / Exercises, Chapter 5 – Agglomerative Clustering Using hclust(), Chapter 8 – Probability Distributions, Covariance, and Correlation, Chapter 10 – Classification with k-Nearest Neighbors and Naïve Bayes, Chapter 13 – Text Analytics with R
F
- factor() function / Histograms
- Female branch / Understanding decision trees
- forests / Conditional inference trees and forests
G
- gain ratio / The gain ratio
- glm() function / Classification using logistic regression
- GNU R
- installing / Installing GNU R
- URL, for installation on Windows / Installing GNU R
- URL, for installation on Mac OS X / Installing GNU R
- URL, for installation on Linux / Installing GNU R
- graphics
- updating / Updating graphics
- Grouped matrix-based visualization / Visualizing association rules
H
- hclust()
- using, with agglomerative clustering / Agglomerative clustering with hclust(), Exploring the results of votes in Switzerland
- hierarchical clustering
- using, on binary attributes / The use of hierarchical clustering on binary attributes
- hist() function / Histograms
- histogram() function / Histograms
- histograms
- example / Histograms and bar plots
I
- ID3 / ID3
- about / ID3
- entropy / Entropy
- information gain / Information gain
- ifelse() function / Integrating supplementary external data
- information gain / Information gain
- installing
- packages, in R / Installing packages in R
- intercept, simple regression
- computing / Computing the intercept and slope coefficient
- interestMeasure() function / Analyzing the data
- itemsets / Itemsets
K
- k-means
- used, for partition clustering / Learning by doing – partition clustering with kmeans()
- using, with public datasets / Using k-means with public datasets
- k-NN
- about / Understanding k-NN
- working, with in R / Working with k-NN in R
- k, selecting / How to select k
- used, for document classification / Document classification with k-NN
- Kaiser Meyer Olkin (KMO) / PCA diagnostics
- kay.means() function / Internal validation
- knn() function / Understanding k-NN
L
- lattice package
- lattice plots
- discovering / Discovering other lattice plots
- histograms / Histograms
- stacked bars / Stacked bars
- dotplot() function / Dotplots
- data points, displaying as text / Displaying data points as text
- life.expectancy.1971 dataset
- best number of clusters, searching / Finding the best number of clusters in the life.expectancy.1971 dataset
- external validation / External validation
- linear regression
- line plots
- example / Line plots
- llines() function / Loading and discovering the lattice package
- lmer() function / The null model
- logistic regression
- used, for document classification / Classification using logistic regression
- lpoints() function / Loading and discovering the lattice package
- lrect() function / Loading and discovering the lattice package
- ls() function / Loading and discovering the lattice package
- ltext() function / Loading and discovering the lattice package
M
- Male branch / Understanding decision trees
- maximum likelihood (ML) / Random intercepts and fixed slopes
- mean() function / The null model
- Measure of Sample Adequacy(MSA) / PCA diagnostics
- melt() function / Understanding k-NN
- menu bar, R console
- about / The menu bar of the R console
- File menu / A quick look at the File menu
- Misc menu / A quick look at the Misc menu
- models
- exporting, with PMML / Exporting models using PMML
- mtry parameter / Random forests in R
- multilevel modeling
- about / Multilevel modeling in R
- null model / The null model
- random intercepts and fixed slopes / Random intercepts and fixed slopes
- random intercepts and random slopes / Random intercepts and random slopes
- multilevel models
- used, for predicting work satisfaction / Predictions using multilevel models
- predict() function, using / Using the predict() function
- prediction quality, assessing / Assessing prediction quality
- multilevel regression
- about / Multilevel regression
- random intercepts and fixed slopes / Random intercepts and fixed slopes
- random intercepts and random slopes / Random intercepts and random slopes
- multipanel conditioning
- discovering, with xyplot() function / Discovering multipanel conditioning with xyplot()
- multiple regression
- working / Working with multiple regression
- multivariate outliers
N
- naiveBayes() function / Understanding Naïve Bayes
- Naïve Bayes
- about / Understanding Naïve Bayes
- working with, in R / Working with Naïve Bayes in R
- used, for document classification / Document classification with Naïve Bayes
- nested data
- about / Nested data
- examples / Nested data
- Robinson effect / Nested data
- Simpsons paradox / Nested data
- news mining
- about / Mining the news with R
- document classification / A successful document classification
- article topics, extracting / Extracting the topics of the articles
- news articles, collecting / Collecting news articles in R from the New York Times article search API
- New York Times article search AP
- news articles, collecting in R / Collecting news articles in R from the New York Times article search API
- normal distribution / The normal distribution
- NursesML dataset / Chapter 12 – Multilevel Analyses
O
- Outlier detection
- about / Application – Outlier detection
P
- package installation
- C4.5 / Installing C4.5
- C5.0 / Installing C5.0
- CART / Installing CART
- random forest / Installing random forest
- conditional inference trees / Installing conditional inference trees
- data, loading / Loading and preparing the data
- data, preparing / Loading and preparing the data
- packages
- about / Packages
- installing, in R / Installing packages in R
- loading, in R / Loading packages in R
- panel.loess() function / Integrating supplementary external data
- partition clustering
- k-means, using / Learning by doing – partition clustering with kmeans()
- centroids, setting / Setting the centroids
- distances, computing to centroids / Computing distances to centroids
- closest cluster, computing / Computing the closest cluster for each case
- main function, task performed / Tasks performed by the main function
- internal validation / Internal validation
- PCA
- inner working / The inner working of Principal Component Analysis
- using, in R / Learning PCA in R
- missing values, dealing with / Dealing with missing values
- relevant components, selecting / Selecting how many components are relevant
- components with loadings, naming / Naming the components using the loadings
- scores / PCA scores
- diagnostics / PCA diagnostics
- PCA scores
- accessing / Accessing the PCA scores
- for analysis / PCA scores for analysis
- Pearsons correlation / Pearson's correlation
- plot() function / Understanding the data with the all.us.city.crime.1970 dataset
- plotLMER.fnc() function / Random intercepts and random slopes
- plots
- formatting / Formatting plots
- PMML
- used, for exporting models / Exporting models using PMML
- about / What is PMML?
- URL / What is PMML?
- object structure, describing / A brief description of the structure of PMML objects
- predictive model exportation, examples / Examples of predictive model exportation
- predict() function / Understanding Naïve Bayes
- using / Using the predict() function
- predictions
- examining, on testing set / Examining the predictions on the testing set
- predictive model exportation, examples
- about / Examples of predictive model exportation
- k-means objects, exporting / Exporting k-means objects
- hierarchical clustering / Hierarchical clustering
- association rules (apriori objects), exporting / Exporting association rules (apriori objects)
- Naïve Bayes objects, exporting / Exporting Naïve Bayes objects
- decision trees (rpart objects), exporting / Exporting decision trees (rpart objects)
- decision trees (rpart objects, exporting / Exporting decision trees (rpart objects)
- random forest objects, exporting / Exporting random forest objects
- logistic regression objects, exporting / Exporting logistic regression objects
- support vector machine objects, exporting / Exporting support vector machine objects
- predictive models
- cross-validation, with caret package / Cross-validation and bootstrapping of predictive models using the caret package, Cross-validation, Performing cross-validation in R with caret
- bootstrapping, with caret package / Cross-validation and bootstrapping of predictive models using the caret package, Cross-validation, Bootstrapping
- new data, predicting / Predicting new data
- preprocess() function / Chapter 13 – Text Analytics with R
- principal() function / Accessing the PCA scores
- princomp() function / The inner working of Principal Component Analysis
- probability distributions
- about / Probability distributions, Introducing probability distributions
- Discrete uniform distribution / Discrete uniform distribution
- normal distribution / The normal distribution
- Students t distribution / The Student's t-distribution
- binomial distribution / The binomial distribution
- importance / The importance of distributions
- public datasets
- k-means, using with / Using k-means with public datasets
- all.us.city.crime.1970 dataset / Understanding the data with the all.us.city.crime.1970 dataset
- life.expectancy.1971 dataset / Finding the best number of clusters in the life.expectancy.1971 dataset
Q
- qqnorm() function / Random intercepts and random slopes
- Quantile-Quantile plot (Q-Q plot) / Obtaining the residuals
R
- R
- packages, installing in / Installing packages in R
- packages, loading in / Loading packages in R
- PCA, using / Learning PCA in R
- data, analyzing with apriori / Analyzing data with apriori in R, Using apriori for basic analysis
- data, analyzing / Analyzing data in R: correlation and regression
- k-NN, working with / Working with k-NN in R
- Naïve Bayes, working with / Working with Naïve Bayes in R
- analyses, performing / Performing the analyses in R
- conditional inference trees / Conditional inference trees in R
- multilevel modeling / Multilevel modeling in R
- news mining / Mining the news with R
- r.squaredLR() function / Random intercepts and fixed slopes
- random forest
- about / Classification and regression trees and random forest, Random forest
- bagging / Bagging
- installing / Installing random forest
- ranef() function / The null model
- R console
- menu bar / The menu bar of the R console
- references / Chapter 1 – Setting GNU R for Predictive Modeling, Chapter 4 – Cluster Analysis, Chapter 7 – Exploring Association Rules with Apriori, Chapter 13 – Text Analytics with R
- regression / Covariance and correlation
- removeSparseTerms() function / Preprocessing and inspecting the corpus
- review classification
- about / Classification of the reviews
- document classification, with k-NN / Document classification with k-NN
- document classification, with Naïve Bayes / Document classification with Naïve Bayes
- document classification, with logistic regression / Classification using logistic regression
- document classification, with support vector machines (SVM) / Document classification with support vector machines
- reviews
- classifying / Classification of the reviews
- R graphic user interface (RGui)
- about / The R graphic user interface
- Robinson effect / Nested data
- robust regression
- using / Robust regression
- roulette case
- about / The roulette case
- rpart() function / Installing CART
- RStudio
- {rJava} package
- URL / Installing C4.5
S
- scatterplots
- example / Scatterplots
- simple regression
- about / Understanding simple regression
- intercept / Understanding simple regression
- slope coefficient / Understanding simple regression
- intercept, computing / Computing the intercept and slope coefficient
- slope coefficient, computing / Computing the intercept and slope coefficient
- residuals, obtaining / Obtaining the residuals
- coefficient significance, computing / Computing the significance of the coefficient
- sjp.lmer() function / Random intercepts and random slopes
- skmeans() function / Distance measures
- slope coefficient, simple regression
- computing / Computing the intercept and slope coefficient
- significance, computing / Computing the significance of the coefficient
- solutions / Chapter 1 – Setting GNU R for Predictive Modeling, Chapter 4 – Cluster Analysis, Chapter 6 – Dimensionality Reduction with Principal Component Analysis, Chapter 7 – Exploring Association Rules with Apriori, Chapter 8 – Probability Distributions, Covariance, and Correlation, Chapter 10 – Classification with k-Nearest Neighbors and Naïve Bayes, Chapter 11 – Classification Trees, Chapter 13 – Text Analytics with R
- Spearmans correlation / Spearman's correlation
- stacked bars / Stacked bars
- Students distribution / The Student's t-distribution
- support-based pruning
- used, for generating rules / Generating itemsets with support-based pruning
- support vector machines (SVM)
- used, for document classification / Document classification with support vector machines
- Swiss politics
- Swiss Statistics Office
T
- term-document matrix / An introduction to text analytics
- text analytics
- about / An introduction to text analytics
- textual documents, preprocessing / An introduction to text analytics
- tokenizing / An introduction to text analytics
- total sum of square (TSS) / Understanding the data with the all.us.city.crime.1970 dataset
- training
U
- United States State Coordinates
- update() function / Updating graphics
- USCancerRates dataset
- used, for exploring cancer related deaths / Case study – exploring cancer-related deaths in the US
- discovering / Discovering the dataset
- supplementary external data, integrating / Integrating supplementary external data
V
- variable
- about / A quick look at the Misc menu
- vector
- about / A quick look at the Misc menu
- vegdist() function / Distance measures
W
- writeCorpus() function / Collecting news articles in R from the New York Times article search API
X
- xyplot() function
- used, for discovering multipanel conditioning / Discovering multipanel conditioning with xyplot()
- about / Displaying data points as text