Book Image

Apache Solr Enterprise Search Server - Third Edition

By : David Smiley, Eric Pugh, Kranti Parisa, Matt Mitchell
Book Image

Apache Solr Enterprise Search Server - Third Edition

By: David Smiley, Eric Pugh, Kranti Parisa, Matt Mitchell

Overview of this book

<p>Solr Apache is a widely popular open source enterprise search server that delivers powerful search and faceted navigation features—features that are elusive with databases. Solr supports complex search criteria, faceting, result highlighting, query-completion, query spell-checking, relevancy tuning, geospatial searches, and much more.</p> <p>This book is a comprehensive resource for just about everything Solr has to offer, and it will take you from first exposure to development and deployment in no time. Even if you wish to use Solr 5, you should find the information to be just as applicable due to Solr's high regard for backward compatibility. The book includes some useful information specific to Solr 5.</p>
Table of Contents (19 chapters)
Apache Solr Enterprise Search Server Third Edition
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

The DataImportHandler framework


Solr includes a very popular contrib module for importing data known as the DataImportHandler. It's a data processing pipeline built specifically for Solr. Here's a list of the notable capabilities:

  • It imports data from databases through JDBC (Java Database Connectivity). This supports importing only changed records, assuming a last-updated date

  • It imports data from a URL (HTTP GET)

  • It imports data from files (that is, it crawls files)

  • It imports e-mail from an IMAP server, including attachments

  • It supports combining data from different sources

  • It extracts text and metadata from rich document formats

  • It applies XSLT transformations and XPath extraction on XML data

  • It includes a diagnostic/development tool

Furthermore, you could write your own data source or transformation step once you learn how by seeing how the existing ones are coded.

Tip

Consider DIH alternatives

The DIH's capabilities really have little to do with Solr itself, yet the DIH is tied to Solr (to a...