Index
A
- actuarial science / Actuarial science
- affinity propagation clustering / Affinity propagation clustering
- aggregate function / Subqueries
- alternative hypothesis / Hypothesis testing
- Amazon Standard Identification Number (AISM) / The Map data structure
- American College Board Scholastic Aptitude Test in mathematics (AP math test) / The standard normal distribution
- Anscombe's quartet / Anscombe's quartet
- Apache Commons
- download link / The Apache Commons Math Library
- Apache Commons implementation / The Apache Commons implementation
- Apache Commons Math Library
- about / The Apache Commons Math Library
- using / The Apache Commons Math Library
- Apache Hadoop
- about / Apache Hadoop
- Apache Software Foundation
- about / The Apache Commons Math Library
- references, for projects / The Apache Commons Math Library
- ARFF filetype / The ARFF filetype for data
- associative array / The Map data structure
- Attribute-Relation File Format (ARFF)
- about / The ARFF filetype for data
- reference / The ARFF filetype for data
- attribute-value pairs / Key-value pairs
- attributes / Data points and datasets
B
- B-tree / Table indexes
- backward elimination
- about / K-Nearest Neighbors
- bar chart
- about / Bar charts
- generating / Bar charts
- batch mode / Inserting data into the database
- batch processing / Batch processing
- Bayes' theorem / Bayes' theorem
- Bayesian classifiers
- about / Bayesian classifiers
- Java implementation, with Weka / Java implementation with Weka
- support vector machine algorithms / Support vector machine algorithms
- Binary Search algorithm
- about / Decision trees
- binomial coefficient / The binomial distribution
- binomial distribution / The binomial distribution
- bins
- about / Histograms
- BSON object / Java development with MongoDB
- bulk write / The Mongo database system
C
- 95% confidence coefficient / Confidence intervals
- calculated by steam / Calculated by steam
- Canberra metric / Measuring distances
- Cartesian plane / Measuring distances
- Cartesian product / The relation data model
- Cassandra / Other NoSQL database systems
- central limit theorem / The central limit theorem
- Chebyshev metric / Measuring distances
- chessboard metric / Measuring distances
- column data model / Other NoSQL database systems
- column store / The Mongo database system
- comma separated values (CSV)
- about / File formats
- Commons Math API
- reference / Descriptive statistics
- complexity analysis / Hierarchical clustering
- compound indexes / Indexing in MongoDB
- conditional probability / Conditional probability
- confidence intervals / Confidence intervals
- content-based recommendation / Utility matrices
- contingency tables / Contingency tables
- continuous distribution / The standard normal distribution
- correlation coefficient / Covariance and correlation
- cosine similarity / Cosine similarity
- covariance / Covariance and correlation
- crosstab tables / Contingency tables
- cumulative distribution function (CDF) / The exponential distribution, Cumulative distributions
- curse of dimensionality / The curse of dimensionality
- curve fitting / Curve fitting
D
- data / Data, information, and knowledge
- inserting, into database / Inserting data into the database
- data analysis
- origins / Origins of data analysis
- database
- creating / Creating a database
- data, inserting into / Inserting data into the database
- database driver / JDBC
- database query / Database queries
- database schema
- creating, in MySQL / MySQL Workbench
- database views / Database views
- data cleaning / Data cleaning
- Data Definition Language (DDL) / SQL commands
- data filtering / Data filtering
- Data Manipulation Language (DML) / SQL commands
- data normalization / Data scaling
- data points / Data points and datasets
- data ranking / Data ranking
- data scaling / Data scaling
- data scrubbing / Data cleaning
- datasets / Data points and datasets
- data striping / Scaling, data striping, and sharding
- data types / Data types
- decile
- about / Descriptive statistics
- decision tree
- about / Decision trees
- density function / The normal distribution
- descriptive statistic
- about / Descriptive statistics
- Development Environment (IDE) / Java Integrated Development Environments
- dictionary / The Map data structure
- discrete distribution / The standard normal distribution
- distances
- measuring / Measuring distances
- divide and conquer / Google's MapReduce framework
- document data model / Other NoSQL database systems
- document store / The Mongo database system
- domain / The relation data model
- dynamic schemas / Java development with MongoDB
E
- Electronic Numerical Integrator and Computer (ENIAC) / ENIAC
- entropy
- definingTopicnabout / What does entropy have to do with it?
- event
- about / Random sampling
- Excel
- linear regression, performing in / Linear regression in Excel
- exemplars / Affinity propagation clustering
- explained variation / Variation statistics
- exponential distribution / The exponential distribution
- Extensible Markup Language (XML) / XML and JSON data
F
- false negative / Bayes' theorem
- false positive / Bayes' theorem
- fields / Data points and datasets, The relation data model
- file formats
- about / File formats
- foreign key / Foreign keys
- frequency distribution / Frequency distributions
- fuzzy classification algorithms
- about / Fuzzy classification algorithms
G
- Generalized Markup Language (GML) / XML and JSON data
- GeoJSON object types / The MongoDB extension for geospatial databases
- geospatial databases
- reference / The MongoDB extension for geospatial databases
- graph data model / Other NoSQL database systems
- graphs / Tables and graphs
H
- Hadoop Common / Apache Hadoop
- Hadoop Distributed File System (HDFS) / Apache Hadoop
- Hadoop MapReduce / Apache Hadoop
- about / Hadoop MapReduce
- WordCount program / Hadoop MapReduce
- reference, for example / Hadoop MapReduce
- Hadoop YARN / Apache Hadoop
- Hamming distance / Similarity measures
- Hamming similarity / Similarity measures
- hash / Hashing
- hash codes / Hashing
- hash function
- properties / Hashing
- hashing / Hashing
- hash table / Hash tables, Large sparse matrices
- HBase / Other NoSQL database systems
- Herman Hollerith / Herman Hollerith
- hierarchical clustering
- about / Hierarchical clustering
- Weka implementation / Weka implementation
- K-means clustering / K-means clustering
- k-medoids clustering / K-medoids clustering
- affinity propagation clustering / Affinity propagation clustering
- histogram
- about / Histograms
- horizontal scaling / Scaling, data striping, and sharding
- hypothesis testing / Hypothesis testing
I
- ID3 algorithm
- about / The ID3 algorithm
- Java implementation / Java Implementation of the ID3 algorithm
- Java implementation, with Weka / Java implementation with Weka
- indexing
- in MongoDB / Indexing in MongoDB
- information / Data, information, and knowledge
- instance / The relation data model
- International Business Machines Corporation (IBM) / Herman Hollerith
- International Standard Book Numbers / The Map data structure
- item-based recommendation / Utility matrices
- item-to-item collaborative filtering recommender / Amazon's item-to-item collaborative filtering recommender
- Iterative Dichotomizer 3
- about / The ID3 algorithm
J
- Java
- Java Database Connectivity (JDBC) / JDBC
- Java DB / Creating a database
- Java development
- with MongoDB / Java development with MongoDB
- Java implementation
- example / Java implementation
- of linear regression / Java implementation of linear regression
- of ID3 algorithm / Java Implementation of the ID3 algorithm
- Java Integrated Development Environments / Java Integrated Development Environments
- JavaScript Object Notation (JSON) / XML and JSON data
- javax.json library
- about / The javax JSON Library
- JDBC PreparedStatement
- using / Using a JDBC PreparedStatement
- joint probability function / Multivariate distributions
- JSON (JavaScript Object Notation) / The Mongo database system
- JSON data / XML and JSON data
- JSON event types
- identifying / XML and JSON data
- JSON files
- parsing / XML and JSON data
K
- K-Means++ algorithm / K-means clustering
- K-means clustering / K-means clustering
- k-medoids clustering / K-medoids clustering
- K-Nearest Neighbor (KNN) / K-means clustering
- K-Nearest Neighbors
- about / K-Nearest Neighbors
- key-value data model / Other NoSQL database systems
- key-value pairs (KVP) / Key-value pairs
- key field / Key fields
- key values / Key fields
- knowledge / Data, information, and knowledge
- kurtosis
- about / Descriptive statistics
L
- large sparse matrices / Large sparse matrices
- least-squares parabola
- about / Polynomial regression
- level of significance / Hypothesis testing
- lexicographic order / Large sparse matrices
- Library database / The Library database
- linear regression
- about / Linear regression
- in Excel / Linear regression in Excel
- Java implementation / Java implementation of linear regression
- Anscombe's quartet / Anscombe's quartet
- line graph
- about / Line graphs
- generating / Line graphs
- logarithmic time / Table indexes
- logistic function
- about / Logistic regression
- logistic regression
- about / Logistic regression
- example / Logistic regression
- K-Nearest Neighbors / K-Nearest Neighbors
- logit function / Logistic regression
- LU decomposition
- about / Polynomial regression
M
- Manhattan metric / Measuring distances
- map / Large sparse matrices
- Map data structure / The Map data structure
- MapReduce
- matrix multiplication, implementing / Matrix multiplication with MapReduce
- in MongoDB / MapReduce in MongoDB
- MapReduce applications
- examples / Some examples of MapReduce applications
- MapReduce framework / Google's MapReduce framework
- marginal probabilities / Multivariate distributions
- Markov chain / Google's PageRank algorithm
- matrix multiplication
- with MapReduce / Matrix multiplication with MapReduce
- maximum
- about / Descriptive statistics
- mean average
- about / Descriptive statistics
- median
- about / Descriptive statistics
- merging / Merging
- message-passing / Affinity propagation clustering
- metadata / Metadata
- method of least squares
- about / Polynomial regression
- metric / Measuring distances
- metric space / Measuring distances
- Microsoft Excel
- moving average, computing / Moving average
- Microsoft Excel data / Microsoft Excel data
- minimal spanning tree (MST) / Some examples of MapReduce applications
- minimum
- about / Descriptive statistics
- Minkowski metric / Measuring distances
- mode
- about / Descriptive statistics
- mongo-java-driver JAR files
- download link / Java development with MongoDB
- Mongo database system / The Mongo database system
- MongoDB
- download link / The Mongo database system
- references / The Mongo database system
- indexing / Indexing in MongoDB
- need for / Why NoSQL and why MongoDB?
- about / MongoDB
- MongoDB extension
- for geospatial databases / The MongoDB extension for geospatial databases
- MongoDB installation file
- download link / MongoDB
- MongoDB Manual
- reference / The Mongo database system
- moving average / Moving average
- computing, in Microsoft Excel / Moving average
- MovingAverage class
- test program / Moving average
- moving average series / Moving average
- multilinear functions / Multiple linear regression
- multiple linear regression / Multiple linear regression
- multivariate distributions / Multivariate distributions
- multivariate probability distribution function / Multivariate distributions
- MySQL
- database schema, creating / MySQL Workbench
- MySQL database
- accessing, from NetBeans / Accessing the MySQL database from NetBeans
N
- naive Bayes classification algorithm
- about / Bayesian classifiers
- Neoj4 / Other NoSQL database systems
- NetBeans
- MySQL database, accessing from / Accessing the MySQL database from NetBeans
- Netflix prize / The Netflix prize
- normal distribution
- about / The normal distribution
- example / A thought experiment
- normal equations / Computing the regression coefficients
- NoSQL
- versus SQL / SQL versus NoSQL
- need for / Why NoSQL and why MongoDB?
- NoSQL database systems
- reference / Other NoSQL database systems
- about / Other NoSQL database systems
- null hypothesis / Hypothesis testing
- null values / Null values
O
- offset / Using random access files
- ordinary least squares (OLS)
- outlier / Anscombe's quartet
P
- PageRank algorithm / Google's PageRank algorithm
- parser / XML and JSON data
- parsing / XML and JSON data
- partitioning around medoids (PAM) / K-medoids clustering
- percentile
- about / Descriptive statistics
- POI open source API library
- download link / Microsoft Excel data
- polynomial regression
- about / Polynomial regression
- population
- about / Random sampling
- primary key / The relation data model
- probabilistic events
- independence / The independence of probabilistic events
- probabilities
- facts / Random sampling
- probability density function (PDF) / A thought experiment
- probability distribution function (PDF) / Probability distributions
- probability function
- about / Random sampling
- probability set function
- about / Random sampling
- Pythagorean theorem / Measuring distances
Q
- quality control department (QCD) / Confidence intervals
- quartiles
- about / Descriptive statistics
R
- random access files
- using / Using random access files
- random experiment
- about / Random sampling
- random sample
- about / Random sampling
- random sampling
- about / Random sampling
- random variable / Random variables
- range
- about / Descriptive statistics
- red-black tree data structure / Large sparse matrices
- Redis / Other NoSQL database systems
- regression
- about / Linear regression
- regression coefficients
- computing / Computing the regression coefficients
- relation / The relation data model
- relational database (Rdb) / Java development with MongoDB
- relational database (RDB) / The relation data model, Relational databases
- relational database design
- about / Relational database design
- database, creating / Creating a database
- SQL commands / SQL commands
- data, inserting into database / Inserting data into the database
- database queries / Database queries
- SQL data types / SQL data types
- Java Database Connectivity (JDBC) / JDBC
- JDBC PreparedStatement, using / Using a JDBC PreparedStatement
- batch processing / Batch processing
- database views / Database views
- subqueries / Subqueries
- table indexes / Table indexes
- relational databases (Rdbs) / The Mongo database system
- relational database system (RDBMS) / Creating a database
- Relational database systems (Rdbs) / Scaling, data striping, and sharding
- relational database tables / Relational database tables
- relation data model / The relation data model
- residual / Linear regression in Excel
- rows / The relation data model
- running average / Moving average
S
- sample / Frequency distributions
- sample correlation coefficient / Linear regression in Excel
- sample space
- about / Random sampling
- sample variance / Descriptive statistics
- scalability / Scalability
- scaling / Scaling, data striping, and sharding
- scatter plot
- about / Scatter plots
- generating / Scatter plots
- schema / The relation data model, Relational databases
- scientific method / The scientific method
- sharding / Scaling, data striping, and sharding
- shards / Scaling, data striping, and sharding
- show collections command
- collections / The Mongo database system
- sigmoid curve
- about / Logistic regression
- similarity measure / Utility matrices, Similarity measures
- simple average
- about / Descriptive statistics
- simple recommender system / A simple recommender system
- skewness
- about / Descriptive statistics
- SN 1572 / The scientific method
- sorting / Sorting
- sparse matrix / Large sparse matrices, Google's PageRank algorithm
- sparse matrix format / Matrix multiplication with MapReduce
- spectacular example / A spectacular example
- SQL
- versus NoSQL / SQL versus NoSQL
- SQL (Structured Query Language) / Creating a database
- SQL commands / SQL commands
- SQL data types / SQL data types
- SQL script / SQL commands
- Standard Generalized Markup Language (SGML) / XML and JSON data
- standard normal distribution / The standard normal distribution
- standard normal distribution / The standard normal distribution
- statement object / Using a JDBC PreparedStatement
- statistics
- descriptive statistics / Descriptive statistics
- subquery / Subqueries
- support vector machine (SVM) / Support vector machine algorithms
- support vector machine algorithms / Support vector machine algorithms
T
- table indexes / Table indexes
- tables / Tables and graphs
- taxicab metric / Measuring distances
- test datasets
- generating / Generating test datasets
- time series
- about / Time series
- simulating / Java example
- TimeSeries class
- test program / Java implementation
- total variation / Variation statistics
- transition matrix / Google's PageRank algorithm
- triangle inequality / Measuring distances
- tuples / The relation data model
- two-tailed test / Hypothesis testing
- Type I error / Bayes' theorem, Hypothesis testing
- Type II error / Bayes' theorem
- type signature / Data points and datasets
U
- unexplained variation / Variation statistics
- Universal Product Codes (UPCs) / The Map data structure
- user ratings
- implementing / Implementing user ratings
- utility matrix / Utility matrices
V
- variables / Variables
- variation statistics / Variation statistics
- vehicle identification number (VIN) / The Map data structure
- vertical scaling / Scaling, data striping, and sharding
- virtual table / Database views
- VisiCalc / VisiCalc
W
- weighted mean
- about / Descriptive statistics
- Weka
- about / The Weka platform
- download link / The Weka libraries
- Weka implementation / Weka implementation
- Weka libraries
- about / The Weka libraries
- Weka Workbench
- reference / The Weka platform
- WordCount example / The WordCount example
- WordCount problem / Some examples of MapReduce applications
X
- XML data / XML and JSON data