Book Image

Apache Solr 3.1 Cookbook

By : Rafał Kuć
Book Image

Apache Solr 3.1 Cookbook

By: Rafał Kuć

Overview of this book

<p>Apache Solr is a fast, scalable, modern, open source, and easy-to-use search engine. It allows you to develop a professional search engine for your ecommerce site, web application, or back office software. Setting up Solr is easy, but configuring it to get the most out of your site is the difficult bit.</p> <p>The Solr 3.1 Cookbook will make your everyday work easier by using real-life examples that show you how to deal with the most common problems that can arise while using the Apache Solr search engine. Why waste your time searching the Internet for solutions when you can have all the answers in one place?</p> <p>This cookbook will show you how to get the most out of your search engine. Each chapter covers a different aspect of working with Solr from analyzing your text data through querying, performance improvement, and developing your own modules. The practical recipes will help you to quickly solve common problems with data analysis, show you how to use faceting to collect data and to speed up the performance of Solr. You will learn about functionalities that most newbies are unaware of, such as sorting results by a function value, highlighting matched words, and computing statistics to make your work with Solr easy and stress free.</p>
Table of Contents (17 chapters)
Apache Solr 3.1 Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Ignoring defined words


Imagine a situation where you would like to filter the words that are considered vulgar from the data we are indexing. Of course, by accident, such words can be found in your data and you don't want them to be searchable, thus you want to ignore them. Can we do that with Solr? Of course we can and this recipe will show you how.

How to do it...

Let's start with the index structure (just add this to your schema.xml file to the fields section):

<field name="id" type="string" indexed="true" stored="true" required="true" /> 
<field name="name" type="text_ignored" indexed="true" stored="true" />

The text_ignored type definition looks like this:

<fieldType name="text_ignored" class="solr.TextField" positionIncrementGap="100">
 <analyzer>
  <tokenizer class="solr.WhitespaceTokenizerFactory"/>
  <filter class="solr.StopFilterFactory" ignoreCase="true" words="ignored.txt" enablePositionIncrements="true" />
 </analyzer>
</fieldType>

The...