Book Image

Administrating Solr

By : Surendra Mohan
Book Image

Administrating Solr

By: Surendra Mohan

Overview of this book

Implementing different search engines on web products is a mandate these days. Apache Solr is a robust search engine, but simply implementing Apache Solr and forgetting about it is not a good idea, especially when you have to fight for the search ranking of your web product. In such a scenario, you need to keep monitoring, administrating, and optimizing your Solr to retain your ranking. "Administrating Solr" is a practical, hands-on guide. This book will provide you with a number of clear, step-by-step exercises and some advanced concepts which will help you administrate, monitor, and optimize Solr using Drupal and associated scripts. Administrating Solr will also provide you with a solid grounding on how you can use Apache Solr with Drupal. "Administrating Solr" starts with an overview of Apache Solr and the installation process to get you familiar with Solr. It then gradually moves on to discuss the mysteries that make Solr flexible enough to render appropriate search results in different scenarios. This book will take you through clear and practical concepts that will help you monitor, administrate, and optimize your Solr appropriately using both scripts and tools. This book will also teach you ways to query your search and methods to keep your Solr healthy and well maintained. With this book, you will learn how to effectively implement and optimize Solr using Drupal.
Table of Contents (12 chapters)

Language Detection


In this section, we will learn about language detections, and how to set up and configure so as to make it functional.

Solr has a unique capability to identify languages and map them with their respective fields while indexing. To do so, it uses langid, which is a UpdateRequestProcessor. This language detection feature can be implemented in Solr using the following:

  • Tika language detection

  • LangDetect language detection

  • Compact Language Detector (CLD)

Now, we will have a look at the comparison between these three implementations.

Parameter

CLD

Apache Tika

LangDetect

Language count supported

21

17

21

Languages not supported

N/A

Bulgarian, Czech, Lithuanian, and Latvian

N/A

Languages detected

> 76

27

53

Accuracy

Medium

Low

High

Confusing Languages

 

Danish confused with Norwegian

Danish confused with Norwegian

Incorrect results (Probability)

Low

Medium

High

Performance

Fast

Slow

Slower

In the given comparative study, we can conclude that Compact...