Book Image

Apache Solr PHP Integration

By : Jayant Kumar
Book Image

Apache Solr PHP Integration

By: Jayant Kumar

Overview of this book

The Search tool is a very powerful for any website. No matter what type of website, the search tool helps visitors find what they are looking for using key words and narrow down the results using facets. Solr is the popular, blazing fast, open source enterprise search platform from the Apache Lucene project. It is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest websites.This book is a practical, hands-on, end-to-end guide that provides you with all the tools required to build a fully-featured search application using Apache Solr and PHP. The book contains practical examples and step-by-step instructions.Starting off with the basics of installing Apache Solr and integrating it with Php, the book then proceeds to explore the features provided by Solr to improve searches using Php. You will learn how to build and maintain a Solr index using Php, discover the query modes available with Solr, and how to use them to tune the Solr queries to retrieve relevant results. You will look at how to build and use facets in your search, how to tune and use fast result highlighting, and how to build a spell check and auto complete feature using Solr. You will finish by learning some of the advanced concepts required to runa large-scale enterprise level search infrastructure.
Table of Contents (15 chapters)
Apache Solr PHP Integration
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Solr relevance ranking


When a query is passed to Solr, it is converted to an appropriate query string that is then executed by Solr. For each document in the result, Solr calculates the relevance score according to which the document is sorted. By default higher scoring documents are given priority in the result.

The Solr relevancy algorithm is known as the tf-idf model where tf stands for term frequency and idf stands for inverse document frequency. The meaning of the parameters used in relevance calculation so we can interpret the output of debug query are explained as follows:

  • tf: The term frequency is the frequency with which a term appears in a document. Higher term frequency results in a high document score.

  • idf: The inverse document frequency is the inverse of the number of documents in which the term appears. It indicates the rarity of the term across all documents in the index. Documents having a rare term are scored higher.

  • coord: It is the coordination factor that says how many...