If you have text in various languages, the main issues you have to think about are the same issues for working with any one language—how to analyze content, configure fields, define search defaults, and so on. In this section, we present three approaches to integrate linguistic analysis into Solr.
With this approach, you will need to create one field per language for all the searchable text fields. As part of your indexing process, you can identify the language and apply the relevant analyzers, tokenizers, and token filters for each of those fields. The following diagram represents how each of the documents in your index will have language-specific fields:
The following are the pros:
As you have separate fields for each language, searching, filtering, and/or faceting will be easy
You will have accurate and meaningful relevancy scores (TF/IDF)
The following are the cons: