Book Image

Apache Solr 3.1 Cookbook

By : Rafał Kuć
Book Image

Apache Solr 3.1 Cookbook

By: Rafał Kuć

Overview of this book

<p>Apache Solr is a fast, scalable, modern, open source, and easy-to-use search engine. It allows you to develop a professional search engine for your ecommerce site, web application, or back office software. Setting up Solr is easy, but configuring it to get the most out of your site is the difficult bit.</p> <p>The Solr 3.1 Cookbook will make your everyday work easier by using real-life examples that show you how to deal with the most common problems that can arise while using the Apache Solr search engine. Why waste your time searching the Internet for solutions when you can have all the answers in one place?</p> <p>This cookbook will show you how to get the most out of your search engine. Each chapter covers a different aspect of working with Solr from analyzing your text data through querying, performance improvement, and developing your own modules. The practical recipes will help you to quickly solve common problems with data analysis, show you how to use faceting to collect data and to speed up the performance of Solr. You will learn about functionalities that most newbies are unaware of, such as sorting results by a function value, highlighting matched words, and computing statistics to make your work with Solr easy and stress free.</p>
Table of Contents (17 chapters)
Apache Solr 3.1 Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Splitting text by whitespace only


One of the most common problems that you have probably come across is having to split the text with whitespaces in order to segregate words from each other, to be able to process it further. This recipe will show you how to do it.

How to do it...

Let's assume that we have the following index structure (add this to your schema.xml file in the field definition section):

<field name="description_string" type="string" indexed="true" stored="true" />
<field name="description_split" type="text_split" indexed="true" stored="true" />

To split the text in the description field, we should add the following type definition:

<fieldType name="text_split" class="solr.TextField">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
</analyzer>
</fieldType>

To test our type, I've indexed the following XML file:

<add>
<doc>
<field name="description_string">test text</field>
<field name="description_text...