Character filters, declared with the <charFilter>
element, process a stream of text prior to tokenization. There are only a few. This feature is not commonly used except for the first one described here, which is configured to strip accents:
MappingCharFilterFactory
: This maps a character (or string) to another—potentially none. In other words, it's a find-replace capability. There is amapping
attribute in which you specify a configuration file. Solr's example configuration includes two such configuration files with useful mappings:mapping-FoldToASCII.txt
: This is a comprehensive mapping of non-ASCII characters to ASCII equivalents. For further details on the characters mapped, read the comments at the top of the file. This char filter has a token filter equivalent namedASCIIFoldingFilterFactory
that should run faster and is recommended instead.mapping-ISOLatin1Accent.txt
: This is a smaller subset covering just the ISO Latin1 accent characters (like ñ to n). Given...