Solr uses the langid UpdateRequestProcessor
to identify languages and then map from text to the language-specific field while indexing.
There are two implementations provided by Solr for language detection:
- Tika language detection
- Langdetect language detection
The configuration for language detection is done in solrconfig.xml
and both Tika as well as langdetect language detection use the same parameters, as follows:
<processor class="org.apache.solr.update.processor.TikaLanguageIdentifierUpdateProcessorFactory"> <lst name="defaults"> <str name="langid.fl">title,subject,text,keywords</str> <str name="langid.langField">language_s</str> </lst> </processor> <processor class= "org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory"> <lst name="defaults"> <str name="langid.fl">title,subject,text,keywords</str> ...