Book Image

Apache Solr Enterprise Search Server - Third Edition

By : David Smiley, Eric Pugh, Kranti Parisa, Matt Mitchell
Book Image

Apache Solr Enterprise Search Server - Third Edition

By: David Smiley, Eric Pugh, Kranti Parisa, Matt Mitchell

Overview of this book

<p>Solr Apache is a widely popular open source enterprise search server that delivers powerful search and faceted navigation features—features that are elusive with databases. Solr supports complex search criteria, faceting, result highlighting, query-completion, query spell-checking, relevancy tuning, geospatial searches, and much more.</p> <p>This book is a comprehensive resource for just about everything Solr has to offer, and it will take you from first exposure to development and deployment in no time. Even if you wish to use Solr 5, you should find the information to be just as applicable due to Solr's high regard for backward compatibility. The book includes some useful information specific to Solr 5.</p>
Table of Contents (19 chapters)
Apache Solr Enterprise Search Server Third Edition
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Character filters


Character filters, declared with the <charFilter> element, process a stream of text prior to tokenization. There are only a few. This feature is not commonly used except for the first one described here, which is configured to strip accents:

  • MappingCharFilterFactory: This maps a character (or string) to another—potentially none. In other words, it's a find-replace capability. There is a mapping attribute in which you specify a configuration file. Solr's example configuration includes two such configuration files with useful mappings:

    • mapping-FoldToASCII.txt: This is a comprehensive mapping of non-ASCII characters to ASCII equivalents. For further details on the characters mapped, read the comments at the top of the file. This char filter has a token filter equivalent named ASCIIFoldingFilterFactory that should run faster and is recommended instead.

    • mapping-ISOLatin1Accent.txt: This is a smaller subset covering just the ISO Latin1 accent characters (like ñ to n). Given...