Book Image

Apache Solr 4 Cookbook

By : Rafał Kuć
Book Image

Apache Solr 4 Cookbook

By: Rafał Kuć

Overview of this book

<p>Apache Solr is a blazing fast, scalable, open source Enterprise search server built upon Apache Lucene. Solr is wildly popular because it supports complex search criteria, faceting, result highlighting, query-completion, query spell-checking, and relevancy tuning, amongst other numerous features.<br /><br />"Apache Solr 4 Cookbook" will show you how to get the most out of your search engine. Full of practical recipes and examples, this book will show you how to set up Apache Solr, tune and benchmark performance as well as index and analyze your data to provide better, more precise, and useful search data.<br /><br />"Apache Solr 4 Cookbook" will make your search better, more accurate and faster with practical recipes on essential topics such as SolrCloud, querying data, search faceting, text and data analysis, and cache configuration.<br /><br />With numerous practical chapters centered on important Solr techniques and methods, Apache Solr 4 Cookbook is an essential resource for developers who wish to take their knowledge and skills further. Thoroughly updated and improved, this Cookbook also covers the changes in Apache Solr 4 including the awesome capabilities of SolrCloud.</p>
Table of Contents (18 chapters)
Apache Solr 4 Cookbook
Credits
About the Author
Acknowledgement
About the Reviewers
www.PacktPub.com
Preface
Index

How to implement a category's autocomplete functionality


Sometimes we are not just interested in our product's name for autocomplete. Imagine that we want to show the category of our products in the autocomplete box along with the number of products in each category. Let's see how we can use faceting to do that.

How to do it...

This recipe will show how we can implement a category's autocomplete functionality.

  1. Let's start with the example data, which is going to be indexed and which looks similar to the following code snippet:

    <add>
      <doc>
        <field name="id">1</field>
        <field name="name">First Solr 4.0 CookBook</field>
        <field name="category">Books</field>
      </doc>
      <doc>
        <field name="id">2</field>
        <field name="name">Second Solr 4.0 CookBook</field>
        <field name="category">Books And Tutorials</field>
      </doc>
    </add>
  2. The fields section of the schema.xml configuration file that can handle the preceding data should look similar to the following code snippet:

    <field name="id" type="string" indexed="true" 
      stored="true" required="true" />
    <field name="name" type="text" indexed="true" 
      stored="true" />
    <field name="category" type="text_lowercase" 
      indexed="true" stored="true" />
  3. One final thing is the text_lowercase type definition, which should be placed in the types section of the schema.xml file. It should look similar to the following code snippet:

    <fieldType name="text_lowercase" class="solr.TextField" 
      positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>
  4. So now, if we would like to get all the categories that start with boo, along with the number of products in those categories, we would send the following query:

    curl 'http://localhost:8983/solr/select?q=*:*&rows=0&facet=true&facet.field=category&facet.mincount=1&facet.limit=5&facet.prefix=boo'
    

    The following response will be returned by Solr:

    <?xml version="1.0" encoding="UTF-8"?>
    <response>
      <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">1</int>
        <lst name="params">
          <str name="facet">true</str>
          <str name="facet.mincount">1</str>
          <str name="indent">true</str>
          <str name="q">*:* </str>
          <str name="facet.limit">5</str>
          <str name="facet.prefix">boo</str>
          <str name="facet.field">category</str>
          <str name="rows">0</str>
        </lst>
      </lst>
    
      <result name="response" numFound="2" start="0">
      </result>
        <lst name="facet_counts">
          <lst name="facet_queries"/>
          <lst name="facet_fields">
          <lst name="category">
            <int name="books">1</int>
            <int name="books and tutorials">1</int>
          </lst>
        </lst>
        <lst name="facet_dates"/>
        <lst name="facet_ranges"/>
      </lst>
    </response>

    As you can see, we have two categories, each containing a single product. So this is what matches our example data. Let's now see how it works.

How it works...

Our data is very simple. We have three fields for each of our documents – one for the identifier fields, one for holding the name of the document, and one for its category. We will use the category field to do the autocomplete functionality, and we will use faceting for it.

If you look at the index structure, for the category field, we use a special type – the text_lowercase one. What it does is that it stores the category as a single token in the index because of solr.KeywordTokenizerFactory. We also lowercase with the appropriate filter. This is because we want to send the lowercased queries while using faceting.

The query is quite simple – we query for all the documents (q=*:* parameter), and we don't want any results returned (the rows=0 parameter). We will use faceting for autocomplete, so we turn it on (facet=true) and we specify the category field to calculate the faceting (facet.field=category). We are also only interested in faceting a calculation for the values that have at least one document in them (facet.mincount=1), and we want the top five results (facet.limit=5). One of of the most important parameters in the query is facet.prefix – using it we can return on those results in faceting that start with the prefix defined by the mentioned parameter, which can be seen in the results.