Book Image

Apache Mahout Clustering Designs

Book Image

Apache Mahout Clustering Designs

Overview of this book

Table of Contents (16 chapters)
Apache Mahout Clustering Designs
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Using DistanceMeasure interface


Usually, the quality of cluster depends on the selected distance measure and the weight of the features in the vector (document). A correct distance measure can bring similar items together. Mahout provides us the flexibility to write custom distance measures. Mahout provides the DistanceMeasure interface under org.apache.mahout.common.distance package. The main method to override here is doubledistance(Vector v1, Vector v2).

Let's take a look at a small implementation of this interface in the following code snippet (source: Mahout in Action):

public double distance(Vector vector1, Vector vector2) {
  if(vector1.size()!=vector2.size()){
  throw newCardinalityException(vector1.size(), vector2.size());

}
  double lengthSquaredv1 = vector1.getLengthSquared();
  double lengthSquaredv2 = vector2.getLengthSquared();
  double dotProduct = vector2.dot(vector1);
  double denominator = Math.sqrt(lengthSquaredv1)* Math.sqrt(lengthSquaredv2);
  if (denominator <dotProduct...