Index
A
- Accelerate / Using another R implementation
- Akaike Information Criterion (AIC) / Striking a balance
- Allighate / Basic probability
- ANCOVA (Analysis of Covariance)
- about / Testing more than two means, Generalized Linear Model (GLM)
- assumptions / Assumptions of ANOVA
- anonymous functions / Functions
- Application Programming Interface (API) / Using JSON
- arguments / Arithmetic and assignment
- arithmetic operations / Arithmetic and assignment
- assertions
- chaining / Chaining assertions
- assignment operations / Arithmetic and assignment
- ATLAS / Using another R implementation
- autocorrelation / Autocorrelation
- automated forecast model / Interventions for improvement
- automatic multithreading / Using another R implementation
- available-case analysis / Pairwise deletion
- average deviation / Spread
B
- bagging / Random forests
- bandwidth / Probability distributions
- base-rate fallacy / Basic probability
- batch mode / Navigating the basics
- Bayes' Theorem / Basic probability
- Bayesian analogue
- distributions, fitting / Fitting distributions the Bayesian way
- independent samples t-test / The Bayesian independent samples t-test
- Bayesian analysis / The big idea behind Bayesian analysis
- Bayesian linear regression / Advanced topics
- bell curve / Central tendency
- Bernoulli distribution / Sampling from distributions
- bias-corrected and accelerated confidence interval (BCa) / Confidence intervals
- bias-variance trade-off / The bias-variance trade-off
- binomial distribution / The binomial distribution
- birthday problem / An example of (some) substance
- Bonferroni correction / Testing more than two means
- Booleans / Logicals and characters
- bootstrap
- about / What's... uhhh... the deal with the bootstrap?
- performing, in R / Performing the bootstrap in R (more elegantly)
- myths / Busting bootstrap myths
- bootstrap distribution / What's... uhhh... the deal with the bootstrap?
- bootstrapping statistics / Bootstrapping statistics other than the mean
- bootstrap replications / Performing the bootstrap in R (more elegantly)
- Box / Version control
- by argument
- used, for grouping / Using the by argument for grouping
- by reference semantics / The i in DT [i, j, by], What in the world are by reference semantics?
- by value semantics / What in the world are by reference semantics?
C
- categorical, and continuous variable
- relationships / Relationships between a categorical and continuous variable
- categorical variables
- relationships / Relationships between two categorical variables
- central limit theorem / The sampling distribution
- central tendency / Central tendency
- checkpoint / Package version management
- classifiers
- selecting / Choosing a classifier
- vertical decision boundary / The vertical decision boundary
- diagonal decision boundary / The diagonal decision boundary
- crescent decision boundary / The crescent decision boundary
- circular decision boundary / The circular decision boundary
- code performance
- about / Be smart about your code
- memory allocation / Allocation of memory
- vectorization / Vectorization
- coin flips / Who cares about coin flips
- columns
- selecting / Selecting and renaming columns
- renaming / Selecting and renaming columns
- computing / Computing on columns
- comments / Arithmetic and assignment
- communicating results / Communicating results
- complete case analysis / Complete case analysis
- Comprehensive R Archive Network (CRAN) / Working with packages
- confidence intervals / Confidence intervals
- confusion matrices / Confusion matrices
- continuous variables
- relationship / The relationship between two continuous variables
- copy-on-modify / What in the world are by reference semantics?
- copy-on-write / What in the world are by reference semantics?
- correlation coefficients / Correlation coefficients
- covariance / Covariance
- CRAN Task Views
- reference / Other data formats
- cross-tabulation / Relationships between two categorical variables
- cross-validation / Cross-validation
- CVS / Version control
D
- data
- loading, into R / Loading data into R
- messy situations / Other tools for messy data
- OpenRefine / OpenRefine
- fuzzy matching / Fuzzy matching
- data.table package
- about / The data.table package
- i argument, in DT [i, j, by] / The i in DT [i, j, by]
- by reference semantics / What in the world are by reference semantics?
- j argument, in DT[i, j, by] / The j in DT[i, j, by]
- i and j arguments, using / Using both i and j
- by argument, used for grouping / Using the by argument for grouping
- data tables, joining / Joining data tables
- data, pivoting / Reshaping, melting, and pivoting data
- data, reshaping / Reshaping, melting, and pivoting data
- data, melting / Reshaping, melting, and pivoting data
- data formats / Other data formats
- data manipulation
- with dplyr / Using dplyr and tidyr to manipulate data
- with tidyr / Using dplyr and tidyr to manipulate data
- data normalization / Regex for data normalization
- data points
- data type of column
- checking / Checking the data type of a column
- decision trees / Decision trees
- degrees of freedom / Populations, samples, and estimation
- directional hypothesis / One and two-tailed tests
- discrete numeric variable / Univariate data
- DOM (Document Object Model) / XML
- double exponential smoothing / Double exponential smoothing
- dplyr
- used, for data manipulation / Using dplyr and tidyr to manipulate data
- data, loading / Loading data for use in dplyr
- grouping / Grouping in dplyr
- data, joining / Joining data
E
- Emacs Speaks Statistics (ESS) / R scripting
- ensemble learning / Random forests
- entry errors
- Error, Trend, and Seasonal (ETS)
- space model / ETS and the state space model
- errors, NHST / Errors in NHST
- estimate / Populations, samples, and estimation
- Expectation Maximization (EM) method / Stochastic regression imputation
F
- FastR / Using another R implementation
- flow of control construct / Flow of control
- forecasting
- about / What is forecasting?
- uncertainity / Uncertainty
- difficulties / Difficulties in forecasting
- forking / Getting started with parallel R
- frequency distributions / Frequency distributions
- functional programming
- about / Functional programming as a main tidyverse principle
- data, loading in dplyr / Loading data for use in dplyr
- rows, manipulating / Manipulating rows
- columns, selecting / Selecting and renaming columns
- columns, renaming / Selecting and renaming columns
- columns, computing / Computing on columns
- grouping, in dplyr / Grouping in dplyr
- data, joining in dplyr / Joining data
- functions / Functions
- fuzzy matching / Fuzzy matching
G
- Gaussian distribution / Central tendency
- Gaussian white noise / White noise
- Generalized Additive Models (GAMs) / Advanced topics
- Generalized Linear Model (GLM) / Generalized Linear Model (GLM)
- Git / Version control
H
- H0 (null hypothesis) / The null hypothesis significance testing framework
- H1 (alternative hypothesis) / The null hypothesis significance testing framework
- Holm-Bonferroni correction / Testing more than two means
- hot deck imputation / Hot deck imputation
- hyperplane / Multiple regression
I
- imperative programming / Functional programming as a main tidyverse principle
- independence of proportions
- testing / Testing independence of proportions
- independent samples t-test
- assumptions / Assumptions of the independent samples t-test
- indexing / Subsetting
- Integrated Development Environment (IDE) / R scripting
- Intel Math Kernel Library (MKL) / Using another R implementation
- interaction terms / Advanced topics
- interpretations / A tale of two interpretations
- interval estimation
- about / Interval estimation
- qnorm function, using / How did we get 1.96?
- Iteratively Re-Weighted Least Squares (IWLS) / A word of warning
J
- Jaccard index / Using JSON
- JavaScript Object Notation (JSON) / Using JSON
- joint distribution / Enter MCMC – stage left
- Just Another Gibbs Sampler (JAGS)
- using / Using JAGS and runjags
K
- k-fold cross validation / Cross-validation
- k-Nearest neighbors
- about / k-Nearest neighbors
- using, in R / Using k-NN in R
- limitations / Limitations of k-NN
- kernel density estimation / Probability distributions
- kitchen sink regression / Kitchen sink regression
L
- lambda functions / Functions
- LaTeX / Communicating results
- left-tailed distribution / Central tendency
- linear models / Linear models
- linear regression diagnostics
- about / Linear regression diagnostics
- second Anscombe relationship / Second Anscombe relationship
- third Anscombe relationship / Third Anscombe relationship
- fourth Anscombe relationship / Fourth Anscombe relationship
- list-wise deletion / Complete case analysis
- logistic regression
- about / Logistic regression
- using, in R / Using logistic regression in R
M
- Mann-Whitney U test / What if my assumptions are unfounded?
- Markov chain Monte Carlo (MCMC) / Enter MCMC – stage left
- mathematical operators
- arithmetic / Arithmetic and assignment
- assignments / Arithmetic and assignment
- logical / Logicals and characters
- characters / Logicals and characters
- matrices / Matrices
- Maximum Likelihood Estimate (MLE) / The big idea behind Bayesian analysis, Logistic regression
- mean height
- estimating / Estimating means
- mean of one sample
- testing / Testing the mean of one sample
- Mean Squared Error (MSE) / Simple linear regression
- mean substitution / Mean substitution
- Mercurial / Version control
- methods, for missing data
- complete case analysis / Complete case analysis
- pairwise deletion / Pairwise deletion
- mean substitution / Mean substitution
- hot deck imputation / Hot deck imputation
- regression imputation / Regression imputation
- stochastic regression imputation / Stochastic regression imputation
- multiple imputation / Multiple imputation
- mice
- imputed values, obtaining / So how does mice come up with the imputed values?
- methods of imputation, using / Methods of imputation
- multiple imputation, using / Multiple imputation in practice
- reference / Multiple imputation in practice
- Missing At Random (MAR) / Types of missing data
- Missing Completely At Random (MCAR) / Types of missing data
- missing data
- analysis / Analysis with missing data
- visualizing / Visualizing missing data
- Missing Completely At Random (MCAR) / Types of missing data
- Missing At Random (MAR) / Types of missing data
- Missing Not At Random (MNAR) / Types of missing data
- dataset, assumption / So which one is it?
- Missing Not At Random (MNAR) / Types of missing data
- Monte Carlo case resampling / What have we left out?
- Monte Carlo simulation / Who cares about coin flips
- multiple correlations
- comparing / Comparing multiple correlations
- multiple imputation / Multiple imputation
- multiple means
- testing / Testing more than two means
- multiple regression / Multiple regression
- multivariate data / Multivariate data
- MusicBrainz
- reference / XML
N
- negatively skewed distribution / Central tendency
- non-linear modeling / Advanced topics
- normal distribution
- about / The normal distribution
- three-sigma rule / The three-sigma rule and using z-tables
- z-tables, using / The three-sigma rule and using z-tables
- Not a Number (NaN) / Arithmetic and assignment
- null hypothesis / The null hypothesis significance testing framework
- Null Hypothesis Significance Testing (NHST)
- about / The null hypothesis significance testing framework
- one-tailed test / One and two-tailed tests
- two-tailed tests / One and two-tailed tests
- errors / Errors in NHST
- warning, about significance / A warning about significance
- p-values / A warning about p-values
O
- one-sample test
- of means / A one-sample test of means
- one sample t-test
- about / Testing the mean of one sample
- assumptions / Assumptions of the one sample t-test
- online repositories / Online repositories
- OpenBLAS / Using another R implementation
- OpenRefine / OpenRefine
- optimization / Wait to optimize
- optimized packages
- using / Using optimized packages
- ordinal variable / Multiple imputation in practice
- Out-Of-Bag (OOB) error rate / Random forests
- out-of-bounds data
- checking / Checking for out-of-bounds data
- outliers
P
- p-values / A warning about p-values
- packages
- working with / Working with packages
- package version management / Package version management
- packrat / Package version management
- pairwise deletion / Pairwise deletion
- parallelization
- using / Using parallelization
- in R / Getting started with parallel R
- example / An example of (some) substance
- parameters / Parameters
- polymorphism / Loading data into R
- population / Populations, samples, and estimation
- positively skewed distribution / Central tendency
- pqR / Using another R implementation
- predictive mean matching / Methods of imputation
- prior
- about / Basic probability
- selecting / Choosing a prior
- probability / Basic probability
- probability density function (PDF) / Probability distributions
- probability distribution
- about / Probability distributions
- sampling / Sampling from distributions
- parameters / Parameters
- binomial distribution / The binomial distribution
- probability mass function (PMF) / Probability distributions
Q
- QQ-plot (quantile-quantile plot) / What if my assumptions are unfounded?
- quantile / How did we get 1.96?
R
- R
- help, obtaining / Getting help in R
- data, loading into / Loading data into R
- bootstrap, performing / Performing the bootstrap in R (more elegantly)
- k-Nearest neighbors, using / Using k-NN in R
- logistic regression, using / Using logistic regression in R
- random forests / Random forests
- Rcpp
- using / Using Rcpp
- Rcpp FAQ
- reference / Using Rcpp
- regression
- about / Correlation coefficients
- with non-binary predictor / Regression with a non-binary predictor
- regression imputation / Regression imputation
- regular expressions
- about / Regular expressions, What are regular expressions?, Getting started
- for data normalization / Regex for data normalization
- normalization / More normalization
- regularization / Advanced topics
- relational databases / Relational databases
- Renjin / Using another R implementation
- REPL (Read-Evaluate-Print-Loop) / Navigating the basics
- Residual Sum of Squares (RSS) / Simple linear regression
- Revolution R Enterprise / Using another R implementation
- Revolution R Open / Using another R implementation
- right-tailed distribution / Central tendency
- Root Mean Squared Error (RMSE) / Simple linear regression
- rows
- manipulating / Manipulating rows
- R projects / R projects
- R scripts
- about / R scripting
- executing / Running R scripts
- example / An example script
- reproducibility / Scripting and reproducibility
- scripting / Scripting and reproducibility
- RStudio
- Rtools
- reference / Using Rcpp
- runjags
- using / Using JAGS and runjags
S
- sampling distribution / The sampling distribution
- sampling with replacement / What's... uhhh... the deal with the bootstrap?
- sanity test / Multiple imputation in practice
- simple exponential smoothing
- for forecasting / Simple exponential smoothing for forecasting
- simple linear regression
- about / Simple linear regression
- with binary predictor / Simple linear regression with a binary predictor
- warning / A word of warning
- Simpson's Paradox / Relationships between two categorical variables
- smaller sample / Smaller samples
- smoothing
- about / Smoothing
- accuracy assessment / Accuracy assessment
- double exponential smoothing / Double exponential smoothing
- triple exponential smoothing / Triple exponential smoothing
- Spearman's rho / Correlation coefficients
- spread operation / Spread
- standard deviation / Spread
- standard error / The sampling distribution
- standard evaluation / Selecting and renaming columns
- standardization / The three-sigma rule and using z-tables
- stepwise regression / Striking a balance
- stochastic regression imputation / Stochastic regression imputation
- strings / Logicals and characters
- Student's t-distribution / Smaller samples
- subscript operator / Subsetting
- subsetting / Subsetting
- Subversion / Version control
T
- TeamDrive / Version control
- three-sigma rule / The three-sigma rule and using z-tables
- tidyr
- used, for data manipulation / Using dplyr and tidyr to manipulate data
- data, reshaping / Reshaping data with tidyr
- Tidy Tools Manifesto
- reference / Using dplyr and tidyr to manipulate data
- tidyverse
- about / Using dplyr and tidyr to manipulate data
- reference / Using dplyr and tidyr to manipulate data
- functional programming / Functional programming as a main tidyverse principle
- time series
- about / What is a time series?
- creating / Creating and plotting time series
- plotting / Creating and plotting time series
- components / Components of time series
- time series decomposition / Time series decomposition
- trend line / Correlation coefficients
- triple exponential smoothing / Triple exponential smoothing
- Tukey's variation / Relationships between a categorical and continuous variable
- two-fold cross validation / Cross-validation
- two means
- testing / Testing two means
U
- unexpected categories
- checking / Checking for unexpected categories
- univariate data / Univariate data
- unsanitized data
- checking / Checking unsanitized data
- out-of-bounds data, checking / Checking for out-of-bounds data
- data type of column, checking / Checking the data type of a column
- unexpected categories, checking / Checking for unexpected categories
- outliers, checking / Checking for outliers, entry errors, or unlikely data points
- data points, checking / Checking for outliers, entry errors, or unlikely data points
- entry errors, checking / Checking for outliers, entry errors, or unlikely data points
- assertions, chaining / Chaining assertions
V
- Variance Inflation Factor (VIF) / Fourth Anscombe relationship
- VCD (Visualizing Categorical Data) / Two categorical variables
- vectorization / Vectorization
- vectorized functions / Vectorized functions
- vectors
- about / Vectors
- subsetting / Subsetting
- advanced subsetting / Advanced subsetting
- recycling / Recycling
- version control
- about / Version control
- package version management / Package version management
- visualization methods
- about / Visualization methods, Visualization methods
- categorical, and continuous variables / Categorical and continuous variables
- two categorical variables / Two categorical variables
- two continuous variables / Two continuous variables
- multiple continuous variables / More than two continuous variables
W
- Web Technologies Task View
- reference / Other data formats
- white noise / White noise
X
Z
- z-tables