The clustering component groups documents into similar clusters
using sophisticated statistical techniques. Each cluster is identified by a few words from the documents that were used to distinguish the documents in that cluster from the other clusters. As with the MoreLikeThis
component, which also uses statistical techniques, the quality of the results is hit or miss. This component resides in its own contrib module and it provides an extension point to integrate a clustering engine.
Tip
The primary means of navigation/discovery of your data should generally be search and faceting. For so-called unstructured text use cases, there are, by definition, few attributes to facet on. Clustering search results and presenting tag clouds (a visualization of faceting on words) are generally exploratory navigation methods of last resort in the absence of more effective document metadata.
Presently, there are two search-result clustering algorithms available as part of the Carrot2...