Index

A

affinity matrix
- URL / Mahout implementation of spectral clustering
algorithms
- sequential algorithms / Algorithm support in Mahout
- parallel algorithms / Algorithm support in Mahout
applications, Clustering
- for marketing / Application of clustering
- for recommendations / Application of clustering
- image segmentation / Application of clustering
- in bioinformatics / Application of clustering
- web search / Application of clustering
- about / Application of clustering

B

BallKMeans step
- about / The BallKMeans step
- seeding stage / The BallKMeans step
- numRuns parameter / The BallKMeans step
- testProbabilty parameter / The BallKMeans step
- correctWeigths parameter / The BallKMeans step
- numClusters parameter / The BallKMeans step
- trimFraction parameter / The BallKMeans step
- maxNumIterations parameter / The BallKMeans step
- kMeansPlusPlusInit parameter / The BallKMeans step
Bayesian information criterion (BIC) / Learning model-based clustering

C

canopy clustering
- running, on Mahout / Running Canopy clustering on Mahout
- URL / Running Canopy clustering on Mahout
- generation phase / The Canopy generation phase
- clustering phase / The Canopy clustering phase
- running / Running Canopy clustering
- output, using for K-means / Using the Canopy output for K-means
Class UnitVectorizerJob
- URL / Mahout implementation of spectral clustering
Clustering
- about / The clustering concept
- pattern finding algorithm / The clustering concept
- distance measuring technique / The clustering concept
- Grouping and Stopping / The clustering concept
- Analysis of Output / The clustering concept
- applications / Application of clustering
- techniques / Understanding different clustering techniques
Clustering algorithms
- K-means Clustering / Clustering algorithms in Mahout
- Fuzzy K-means / Clustering algorithms in Mahout
- Streaming K-means / Clustering algorithms in Mahout
- Spectral Clustering / Clustering algorithms in Mahout
- Canopy Clustering / Clustering algorithms in Mahout
- Latent Dirichlet Allocation for topic modeling / Clustering algorithms in Mahout
clustering phase, Canopy clustering / The Canopy clustering phase
cluster quality
- improving / Using DistanceMeasure interface
clusters
- visualizing / Visualizing clusters, Visualizing clusters, Visualizing clusters
- evaluating / Evaluating clusters
- quality, improving / Using DistanceMeasure interface
clusters evaluation
- methods / Evaluating clusters
- extrinsic methods / Extrinsic methods
- intrinsic methods / Intrinsic methods
CSV files
- working with / Working with CSV files

D

data
- preparation, for using in clustering techniques / Preparing data for use with clustering techniques
dataset
- URL / Dataset selection
- preparing / Preparing the dataset
Davies-Bouldin index
- about / Intrinsic methods
- URL / Intrinsic methods
density-based method, Clustering / The density-based method
Dirichlet clustering / Understanding Dirichlet clustering
distance measures
- about / Understanding distance measures
- for numeric variables / Understanding distance measures
- SquaredEuclideanDistanceMeasure / Understanding distance measures
- WeightedEuclideanDistanceMeasure / Understanding distance measures
- ManhattanDistanceMeasure / Understanding distance measures
- WeightedManhattanDistanceMeasure class / Understanding distance measures
- ChebyshevDistanceMeasure / Understanding distance measures
- CosineDistanceMeasure / Understanding distance measures
- TanimotoDistanceMeasure / Understanding distance measures
Dunn Index
- URL / Running Streaming K-means
- about / Intrinsic methods

E

Eclipse
- URL / Building Mahout code using Maven
- used, for setting up development environment / Setting up the development environment using Eclipse
Elbow method / Learning K-means
expectation-maximization (EM) algorithm / Learning model-based clustering
Expectation Maximization (EM) algorithm / Learning K-means
- Expectation Step / Learning K-means
- Maximization Step / Learning K-means
extrinsic methods, cluster
- about / Extrinsic methods
- cluster homogeneity / Extrinsic methods
- cluster completeness / Extrinsic methods
- rag bag / Extrinsic methods
- cluster size versus quantity / Extrinsic methods
- Jaccard index / Extrinsic methods
- rand statistics / Extrinsic methods
- F-measure / Extrinsic methods
- Fowlkes-Mallows index (FM) / Extrinsic methods
- entropy-based cluster evaluation / Extrinsic methods

F

Fuzzy K-means clustering
- about / Learning Fuzzy K-means clustering
- running, on Mahout / Running Fuzzy K-means on Mahout
- running, with liver disorders dataset / Dataset
- vector, creating for dataset / Creating a vector for the dataset
- vector reader / Vector reader

G

generation phase, Canopy clustering / The Canopy generation phase
graph Laplacian
- obtaining, from affinity matrix / Getting graph Laplacian from the affinity matrix
- eigenvector / Eigenvectors and eigenvalues
- eigenvalues / Eigenvectors and eigenvalues

H

Hadoop
- URL / Installing Mahout
hierarchical methods, Clustering
- about / Hierarchical methods
- top-down approach / Hierarchical methods
- bottom-up approach / Hierarchical methods
Hortonworks Sandbox, for VirtualBox
- URL / Setting up Mahout for Windows users

I

Inter-cluster
- about / Intrinsic methods
Intra-cluster distance
- about / Intrinsic methods
intrinsic methods, cluster
- entropy-based cluster evaluation / Intrinsic methods
- Dunn index / Intrinsic methods
- Inter-cluster / Intrinsic methods
- Intra-cluster distance / Intrinsic methods
- Davies-Bouldin index / Intrinsic methods
- Silhouette coefficient / Intrinsic methods
Inverse document frequency / Preparing data for use with clustering techniques

K

K, finding
- Elbow method / Learning K-means
- Silhouette coefficient / Learning K-means
- URL / Learning K-means
K-means
- about / Learning K-means
- algorithms / Learning K-means
- running, on Mahout / Running K-means on Mahout
- running, in Map Phase / Running K-means on Mahout
- running, in Reducer Phase / Running K-means on Mahout
- URL / Running K-means on Mahout
- executing / Executing K-means
- Clusterdump result / The clusterdump result
- Canopy output, using / Using the Canopy output for K-means

L

Latent Dirichlet allocation (LDA)
- about / Topic modeling
- mapper phase / Topic modeling
- reducer phase / Topic modeling
- running, with Mahout / Running LDA using Mahout
Latent Dirichlet allocation (LDA), running
- dataset selection / Dataset selection
- CVB (LDA), executing / Steps to execute CVB (LDA)
liver disorders dataset
- about / Dataset
- URL / Dataset
- download, URL / Dataset

M

Mahout
- algorithms / Algorithm support in Mahout
- Clustering algorithms / Clustering algorithms in Mahout
- installing / Installing Mahout
- distribution file , URL / Building Mahout code using Maven
- K-means, running / Running K-means on Mahout
- dataset selection / Dataset selection
- canopy clustering, running / Running Canopy clustering on Mahout
- Latent Dirichlet allocation (LDA), running / Running LDA using Mahout
- used, for Streaming K-Means / Using Mahout for streaming K-means
- implementation, of spectral clustering / Mahout implementation of spectral clustering
Mahout installation
- steps / Installing Mahout
- code, building with Maven / Building Mahout code using Maven
- development environment, setting up with Eclipse / Setting up the development environment using Eclipse
- setting up, for Window users / Setting up Mahout for Windows users
mahout job
- launching, on cluster / Launching the Mahout job on the cluster
- performance tuning / Performance tuning for the job
map reduce implementation
- URL / Running Fuzzy K-means on Mahout
- mapper part / Running Fuzzy K-means on Mahout
- combiner part / Running Fuzzy K-means on Mahout
- reducers part / Running Fuzzy K-means on Mahout
Maven
- used, for building Mahout / Building Mahout code using Maven
- URL / Building Mahout code using Maven
model-based clustering
- about / Learning model-based clustering
- Dirichlet clustering / Understanding Dirichlet clustering
- topic modeling / Topic modeling

N

20news-bydate.tar.gz
- URL / Dataset selection
?-neighborhood graph / Affinity (similarity) graph
n-grams / Preparing data for use with clustering techniques
normalized graph Laplacian matrix(random-walk) / Eigenvectors and eigenvalues
normalized graph Laplacian matrix(symmetric) / Eigenvectors and eigenvalues

P

parallel algorithms / Algorithm support in Mahout
partitioning method, Clustering / The partitioning method
performance tuning
- for mahout job / Performance tuning for the job
probabilistic Clustering / Probabilistic clustering

S

sequential algorithms / Algorithm support in Mahout
Silhouette coefficient / Learning K-means
- about / Intrinsic methods
- URL / Intrinsic methods
spectral clustering
- about / Understanding spectral clustering
- affinity (similarity) graph / Affinity (similarity) graph
- graph Laplacian, obtaining from affinity matrix / Getting graph Laplacian from the affinity matrix
- Mahout implementation / Mahout implementation of spectral clustering
spectral clustering algorithm
- about / The spectral clustering algorithm
- unnormalized spectral clustering / The spectral clustering algorithm
- normalized spectral clustering / Normalized spectral clustering
Stochastic Singular Value Decomposition (SSVD)
- about / Mahout implementation of spectral clustering
- URL / Mahout implementation of spectral clustering
Streaming K-Means
- about / Learning Streaming K-means
- URL / Learning Streaming K-means
- implementing / Learning Streaming K-means
- Streaming step / The Streaming step
- BallKMeans step / The BallKMeans step
- Mahout, using / Using Mahout for streaming K-means
- dataset selection / Dataset selection
- CSV file, converting to vector file / Converting CSV to a vector file
- running / Running Streaming K-means
Streaming step
- about / The Streaming step
- distanceCutOff parameter / The Streaming step
- Beta parameter / The Streaming step
- clusterOvershoot parameter / The Streaming step
- clusterLogFactor parameter / The Streaming step
- numClusters parameter / The Streaming step
- URL / The Streaming step

T

techniques, Clustering
- about / Understanding different clustering techniques
- hierarchical methods / Hierarchical methods
- partitioning method / The partitioning method
- density-based method / The density-based method
- probabilistic Clustering / Probabilistic clustering
TF-IDF / Preparing data for use with clustering techniques
topic modeling / Topic modeling
Twitter Apps
- URL / Preparing the dataset
Twitter streams
- collecting, URL / Preparing the dataset

W

Windows users
- setting up / Setting up Mahout for Windows users

Rapid - Apache Mahout Clustering designs

Rapid - Apache Mahout Clustering designs

Overview of this book

Index

A

B

C

D

E

F

G

H

I

K

L

M

N

P

S

T

W

Rapid - Apache Mahout Clustering designs

Rapid - Apache Mahout Clustering designs

Overview of this book

Related Content you might be interested in

Current Title:

Rapid - Apache Mahout Clustering designs

Index

A

B

C

D

E

F

G

H

I

K

L

M

N

P

S

T

W