Book Image

Mastering Apache Solr 7.x

By : Sandeep Nair, Chintan Mehta, Dharmesh Vasoya
Book Image

Mastering Apache Solr 7.x

By: Sandeep Nair, Chintan Mehta, Dharmesh Vasoya

Overview of this book

Apache Solr is the only standalone enterprise search server with a REST-like application interface. providing highly scalable, distributed search and index replication for many of the world's largest internet sites. To begin with, you would be introduced to how you perform full text search, multiple filter search, perform dynamic clustering and so on helping you to brush up the basics of Apache Solr. You will also explore the new features and advanced options released in Apache Solr 7.x which will get you numerous performance aspects and making data investigation simpler, easier and powerful. You will learn to build complex queries, extensive filters and how are they compiled in your system to bring relevance in your search tools. You will learn to carry out Solr scoring, elements affecting the document score and how you can optimize or tune the score for the application at hand. You will learn to extract features of documents, writing complex queries in re-ranking the documents. You will also learn advanced options helping you to know what content is indexed and how the extracted content is indexed. Throughout the book, you would go through complex problems with solutions along with varied approaches to tackle your business needs. By the end of this book, you will gain advanced proficiency to build out-of-box smart search solutions for your enterprise demands.
Table of Contents (14 chapters)
Title Page
Packt Upsell
Contributors
Preface
Index

Basics of Solr indexing


In order to make content available for searching, we need to index it first—as simple as that! The process of indexing essentially involves any one of the three activities as shown in this diagram:

Let's drill down and look at the indexing process, which has the following main actions:

  • Adding content to the Solr Index
  • Updating the index
  • Deleting from the index

Now, there are two basic questions that might arise in your mind:

  • From where does Solr accept data to be indexed? Or what are different sources from where data can be indexed?
  • How do we index data from the sources that we have identified?

Common sources that the Solr index can get data from are:

  • Database tables
  • CSV files
  • XML files
  • Microsoft Word or PDF

The answers to "How does the Solr index get data from the aforementioned sources?" are as follows:

  • Using client APIs
  • Uploading XML files using HTTP requests to the Solr server
  • Using the Apache Tika-based Solr Cell framework to ingest proprietary data formats, such as Word or...