Book Image

Apache Solr Enterprise Search Server - Third Edition

By : David Smiley, Eric Pugh, Kranti Parisa, Matt Mitchell
Book Image

Apache Solr Enterprise Search Server - Third Edition

By: David Smiley, Eric Pugh, Kranti Parisa, Matt Mitchell

Overview of this book

<p>Solr Apache is a widely popular open source enterprise search server that delivers powerful search and faceted navigation features—features that are elusive with databases. Solr supports complex search criteria, faceting, result highlighting, query-completion, query spell-checking, relevancy tuning, geospatial searches, and much more.</p> <p>This book is a comprehensive resource for just about everything Solr has to offer, and it will take you from first exposure to development and deployment in no time. Even if you wish to use Solr 5, you should find the information to be just as applicable due to Solr's high regard for backward compatibility. The book includes some useful information specific to Solr 5.</p>
Table of Contents (19 chapters)
Apache Solr Enterprise Search Server Third Edition
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

The multilingual search


If you have text in various languages, the main issues you have to think about are the same issues for working with any one language—how to analyze content, configure fields, define search defaults, and so on. In this section, we present three approaches to integrate linguistic analysis into Solr.

The multifield approach

With this approach, you will need to create one field per language for all the searchable text fields. As part of your indexing process, you can identify the language and apply the relevant analyzers, tokenizers, and token filters for each of those fields. The following diagram represents how each of the documents in your index will have language-specific fields:

The following are the pros:

  • As you have separate fields for each language, searching, filtering, and/or faceting will be easy

  • You will have accurate and meaningful relevancy scores (TF/IDF)

The following are the cons:

  • The number of fields will increase with the number of languages.

  • Query performance...