Optimizing K-means cluster solutions
K-means clustering is a well-established technique for grouping entities together based on overall similarity. It has many applications including customer segmentation, anomaly detection (finding records that don't fit into existing clusters), and variable reduction (converting many input variables into fewer composite variables).
For all its power and popularity, the K-means algorithm does have a number of known limitations. First, the K-means algorithm is iterative and can arrive at many possible solutions based on the data and the initial algorithm parameters. Some solutions may be better than other solutions and the final solution generally depends on the choice for the location of the initial cluster centers. In most implementations of K-means (including the Modeler implementation), the initial centers depend on the ordering of the data. Thus the quality of the clusters depends on the order of the data during modeling. Second, the K-means algorithm...