Index
A
- affinity matrix
- algorithms
- sequential algorithms / Algorithm support in Mahout
- parallel algorithms / Algorithm support in Mahout
- applications, Clustering
- for marketing / Application of clustering
- for recommendations / Application of clustering
- image segmentation / Application of clustering
- in bioinformatics / Application of clustering
- web search / Application of clustering
- about / Application of clustering
B
- BallKMeans step
- about / The BallKMeans step
- seeding stage / The BallKMeans step
- numRuns parameter / The BallKMeans step
- testProbabilty parameter / The BallKMeans step
- correctWeigths parameter / The BallKMeans step
- numClusters parameter / The BallKMeans step
- trimFraction parameter / The BallKMeans step
- maxNumIterations parameter / The BallKMeans step
- kMeansPlusPlusInit parameter / The BallKMeans step
- Bayesian information criterion (BIC) / Learning model-based clustering
C
- canopy clustering
- running, on Mahout / Running Canopy clustering on Mahout
- URL / Running Canopy clustering on Mahout
- generation phase / The Canopy generation phase
- clustering phase / The Canopy clustering phase
- running / Running Canopy clustering
- output, using for K-means / Using the Canopy output for K-means
- Class UnitVectorizerJob
- Clustering
- about / The clustering concept
- pattern finding algorithm / The clustering concept
- distance measuring technique / The clustering concept
- Grouping and Stopping / The clustering concept
- Analysis of Output / The clustering concept
- applications / Application of clustering
- techniques / Understanding different clustering techniques
- Clustering algorithms
- K-means Clustering / Clustering algorithms in Mahout
- Fuzzy K-means / Clustering algorithms in Mahout
- Streaming K-means / Clustering algorithms in Mahout
- Spectral Clustering / Clustering algorithms in Mahout
- Canopy Clustering / Clustering algorithms in Mahout
- Latent Dirichlet Allocation for topic modeling / Clustering algorithms in Mahout
- clustering phase, Canopy clustering / The Canopy clustering phase
- cluster quality
- improving / Using DistanceMeasure interface
- clusters
- visualizing / Visualizing clusters, Visualizing clusters, Visualizing clusters
- evaluating / Evaluating clusters
- quality, improving / Using DistanceMeasure interface
- clusters evaluation
- methods / Evaluating clusters
- extrinsic methods / Extrinsic methods
- intrinsic methods / Intrinsic methods
- CSV files
- working with / Working with CSV files
D
- data
- preparation, for using in clustering techniques / Preparing data for use with clustering techniques
- dataset
- URL / Dataset selection
- preparing / Preparing the dataset
- Davies-Bouldin index
- about / Intrinsic methods
- URL / Intrinsic methods
- density-based method, Clustering / The density-based method
- Dirichlet clustering / Understanding Dirichlet clustering
- distance measures
- about / Understanding distance measures
- for numeric variables / Understanding distance measures
- SquaredEuclideanDistanceMeasure / Understanding distance measures
- WeightedEuclideanDistanceMeasure / Understanding distance measures
- ManhattanDistanceMeasure / Understanding distance measures
- WeightedManhattanDistanceMeasure class / Understanding distance measures
- ChebyshevDistanceMeasure / Understanding distance measures
- CosineDistanceMeasure / Understanding distance measures
- TanimotoDistanceMeasure / Understanding distance measures
- Dunn Index
- URL / Running Streaming K-means
- about / Intrinsic methods
E
- Eclipse
- URL / Building Mahout code using Maven
- used, for setting up development environment / Setting up the development environment using Eclipse
- Elbow method / Learning K-means
- expectation-maximization (EM) algorithm / Learning model-based clustering
- Expectation Maximization (EM) algorithm / Learning K-means
- Expectation Step / Learning K-means
- Maximization Step / Learning K-means
- extrinsic methods, cluster
- about / Extrinsic methods
- cluster homogeneity / Extrinsic methods
- cluster completeness / Extrinsic methods
- rag bag / Extrinsic methods
- cluster size versus quantity / Extrinsic methods
- Jaccard index / Extrinsic methods
- rand statistics / Extrinsic methods
- F-measure / Extrinsic methods
- Fowlkes-Mallows index (FM) / Extrinsic methods
- entropy-based cluster evaluation / Extrinsic methods
F
- Fuzzy K-means clustering
- about / Learning Fuzzy K-means clustering
- running, on Mahout / Running Fuzzy K-means on Mahout
- running, with liver disorders dataset / Dataset
- vector, creating for dataset / Creating a vector for the dataset
- vector reader / Vector reader
G
- generation phase, Canopy clustering / The Canopy generation phase
- graph Laplacian
- obtaining, from affinity matrix / Getting graph Laplacian from the affinity matrix
- eigenvector / Eigenvectors and eigenvalues
- eigenvalues / Eigenvectors and eigenvalues
H
- Hadoop
- URL / Installing Mahout
- hierarchical methods, Clustering
- about / Hierarchical methods
- top-down approach / Hierarchical methods
- bottom-up approach / Hierarchical methods
- Hortonworks Sandbox, for VirtualBox
I
- Inter-cluster
- about / Intrinsic methods
- Intra-cluster distance
- about / Intrinsic methods
- intrinsic methods, cluster
- entropy-based cluster evaluation / Intrinsic methods
- Dunn index / Intrinsic methods
- Inter-cluster / Intrinsic methods
- Intra-cluster distance / Intrinsic methods
- Davies-Bouldin index / Intrinsic methods
- Silhouette coefficient / Intrinsic methods
- Inverse document frequency / Preparing data for use with clustering techniques
K
- K, finding
- Elbow method / Learning K-means
- Silhouette coefficient / Learning K-means
- URL / Learning K-means
- K-means
- about / Learning K-means
- algorithms / Learning K-means
- running, on Mahout / Running K-means on Mahout
- running, in Map Phase / Running K-means on Mahout
- running, in Reducer Phase / Running K-means on Mahout
- URL / Running K-means on Mahout
- executing / Executing K-means
- Clusterdump result / The clusterdump result
- Canopy output, using / Using the Canopy output for K-means
L
- Latent Dirichlet allocation (LDA)
- about / Topic modeling
- mapper phase / Topic modeling
- reducer phase / Topic modeling
- running, with Mahout / Running LDA using Mahout
- Latent Dirichlet allocation (LDA), running
- dataset selection / Dataset selection
- CVB (LDA), executing / Steps to execute CVB (LDA)
- liver disorders dataset
M
- Mahout
- algorithms / Algorithm support in Mahout
- Clustering algorithms / Clustering algorithms in Mahout
- installing / Installing Mahout
- distribution file , URL / Building Mahout code using Maven
- K-means, running / Running K-means on Mahout
- dataset selection / Dataset selection
- canopy clustering, running / Running Canopy clustering on Mahout
- Latent Dirichlet allocation (LDA), running / Running LDA using Mahout
- used, for Streaming K-Means / Using Mahout for streaming K-means
- implementation, of spectral clustering / Mahout implementation of spectral clustering
- Mahout installation
- steps / Installing Mahout
- code, building with Maven / Building Mahout code using Maven
- development environment, setting up with Eclipse / Setting up the development environment using Eclipse
- setting up, for Window users / Setting up Mahout for Windows users
- mahout job
- launching, on cluster / Launching the Mahout job on the cluster
- performance tuning / Performance tuning for the job
- map reduce implementation
- URL / Running Fuzzy K-means on Mahout
- mapper part / Running Fuzzy K-means on Mahout
- combiner part / Running Fuzzy K-means on Mahout
- reducers part / Running Fuzzy K-means on Mahout
- Maven
- used, for building Mahout / Building Mahout code using Maven
- URL / Building Mahout code using Maven
- model-based clustering
- about / Learning model-based clustering
- Dirichlet clustering / Understanding Dirichlet clustering
- topic modeling / Topic modeling
N
- 20news-bydate.tar.gz
- URL / Dataset selection
- ?-neighborhood graph / Affinity (similarity) graph
- n-grams / Preparing data for use with clustering techniques
- normalized graph Laplacian matrix(random-walk) / Eigenvectors and eigenvalues
- normalized graph Laplacian matrix(symmetric) / Eigenvectors and eigenvalues
P
- parallel algorithms / Algorithm support in Mahout
- partitioning method, Clustering / The partitioning method
- performance tuning
- for mahout job / Performance tuning for the job
- probabilistic Clustering / Probabilistic clustering
S
- sequential algorithms / Algorithm support in Mahout
- Silhouette coefficient / Learning K-means
- about / Intrinsic methods
- URL / Intrinsic methods
- spectral clustering
- about / Understanding spectral clustering
- affinity (similarity) graph / Affinity (similarity) graph
- graph Laplacian, obtaining from affinity matrix / Getting graph Laplacian from the affinity matrix
- Mahout implementation / Mahout implementation of spectral clustering
- spectral clustering algorithm
- about / The spectral clustering algorithm
- unnormalized spectral clustering / The spectral clustering algorithm
- normalized spectral clustering / Normalized spectral clustering
- Stochastic Singular Value Decomposition (SSVD)
- Streaming K-Means
- about / Learning Streaming K-means
- URL / Learning Streaming K-means
- implementing / Learning Streaming K-means
- Streaming step / The Streaming step
- BallKMeans step / The BallKMeans step
- Mahout, using / Using Mahout for streaming K-means
- dataset selection / Dataset selection
- CSV file, converting to vector file / Converting CSV to a vector file
- running / Running Streaming K-means
- Streaming step
- about / The Streaming step
- distanceCutOff parameter / The Streaming step
- Beta parameter / The Streaming step
- clusterOvershoot parameter / The Streaming step
- clusterLogFactor parameter / The Streaming step
- numClusters parameter / The Streaming step
- URL / The Streaming step
T
- techniques, Clustering
- about / Understanding different clustering techniques
- hierarchical methods / Hierarchical methods
- partitioning method / The partitioning method
- density-based method / The density-based method
- probabilistic Clustering / Probabilistic clustering
- TF-IDF / Preparing data for use with clustering techniques
- topic modeling / Topic modeling
- Twitter Apps
- URL / Preparing the dataset
- Twitter streams
- collecting, URL / Preparing the dataset
W
- Windows users
- setting up / Setting up Mahout for Windows users