Book Image

Apache Solr Enterprise Search Server - Third Edition

By : David Smiley, Eric Pugh, Kranti Parisa, Matt Mitchell
Book Image

Apache Solr Enterprise Search Server - Third Edition

By: David Smiley, Eric Pugh, Kranti Parisa, Matt Mitchell

Overview of this book

<p>Solr Apache is a widely popular open source enterprise search server that delivers powerful search and faceted navigation features—features that are elusive with databases. Solr supports complex search criteria, faceting, result highlighting, query-completion, query spell-checking, relevancy tuning, geospatial searches, and much more.</p> <p>This book is a comprehensive resource for just about everything Solr has to offer, and it will take you from first exposure to development and deployment in no time. Even if you wish to use Solr 5, you should find the information to be just as applicable due to Solr's high regard for backward compatibility. The book includes some useful information specific to Solr 5.</p>
Table of Contents (19 chapters)
Apache Solr Enterprise Search Server Third Edition
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

The Clustering component


The clustering component groups documents into similar clusters using sophisticated statistical techniques. Each cluster is identified by a few words from the documents that were used to distinguish the documents in that cluster from the other clusters. As with the MoreLikeThis component, which also uses statistical techniques, the quality of the results is hit or miss. This component resides in its own contrib module and it provides an extension point to integrate a clustering engine.

Tip

The primary means of navigation/discovery of your data should generally be search and faceting. For so-called unstructured text use cases, there are, by definition, few attributes to facet on. Clustering search results and presenting tag clouds (a visualization of faceting on words) are generally exploratory navigation methods of last resort in the absence of more effective document metadata.

Presently, there are two search-result clustering algorithms available as part of the Carrot2...