Book Image

Sphinx Search Beginner's Guide

By : Abbas Ali
Book Image

Sphinx Search Beginner's Guide

By: Abbas Ali

Overview of this book

Table of Contents (15 chapters)
Sphinx Search
Credits
About the Author
Acknowledgement
About the Reviewers
www.PacktPub.com
Preface

Different ways of performing a search


Searching can be done in different ways but here we will take a look at the two most commonly used methods.

Searching on a live database

Whenever your application is dealing with some kind of data, a database is generally involved. There are many databases (both free and commercial) available in the market. Here are a few of the free and open source database servers available:

  • MySQL

  • PostgreSQL

  • SQLite

    Note

    We will be using MySQL throughout this book since Sphinx supports MySQL by default, and it's also the most popular database when it comes to web development.

A live database is one that is actively updated with the latest version of data. At times you may use one database for reading and another for writing, and in such cases you will sync both the databases occasionally. We cannot call such a database 'live', because when reading from one database, while data is being written to the other database, you won't be reading the latest data.

On the other hand, whenever reading from and writing to the database takes place in real-time, we call it a live database.

Let's take an example to understand how search works in the case of a live database.

Assume that we have two database tables in our MySQL database:

  • users

  • addresses

The users table holds data such as your name, e-mail, and password. The addresses table holds the addresses belonging to users. Each user can have multiple addresses. So the users and the addresses table are related to each other.

Let's say we want to search for users based on their name and address. The entered search term can be either the name or part of the address. While performing a search directly on the database, our MySQL query would look something like:

SELECT u.id, u.name
FROM users
AS u LEFT JOIN addresses AS a ON u.id = a.user_id
WHERE u.name LIKE '%search_term%'
OR a.address LIKE '%search_term%' GROUP BY u.id;

The given query will directly search the specified database tables and get the results. The main advantage of using this approach is that we are always performing a search on the latest version of the available data. Hence, if a new user's data has been inserted just before you initiated the search, you will see that user's data in your search results if it matches your search query.

However, one major disadvantage of this approach is that an SQL query to perform such a search is fired every time a search request comes in, and this becomes an issue when the number of records in the users table increases. With each search query, two tables are joined. This adds overhead and further hinders the performance of the query.

Searching an index

In this approach, a query is not fired directly on a database table. Rather, an index is created from the data stored in the database. This index contains data from all the related tables. The index can itself be stored in a database or on a file system.

The advantage of using this approach is that we need not join tables in SQL queries each time a search request comes in, and the search request would not scan every row stored in the database. The search request is directed towards the index which is highly optimized for searching.

The disadvantage would be the additional storage required to store the index and the time required to build the index. However, these are traded off for the time saved during an actual search request.