Book Image

Apache Solr Enterprise Search Server - Third Edition

By : David Smiley, Eric Pugh, Kranti Parisa, Matt Mitchell
Book Image

Apache Solr Enterprise Search Server - Third Edition

By: David Smiley, Eric Pugh, Kranti Parisa, Matt Mitchell

Overview of this book

<p>Solr Apache is a widely popular open source enterprise search server that delivers powerful search and faceted navigation features—features that are elusive with databases. Solr supports complex search criteria, faceting, result highlighting, query-completion, query spell-checking, relevancy tuning, geospatial searches, and much more.</p> <p>This book is a comprehensive resource for just about everything Solr has to offer, and it will take you from first exposure to development and deployment in no time. Even if you wish to use Solr 5, you should find the information to be just as applicable due to Solr's high regard for backward compatibility. The book includes some useful information specific to Solr 5.</p>
Table of Contents (19 chapters)
Apache Solr Enterprise Search Server Third Edition
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Chapter 4. Indexing Data

In this chapter, we're going to explore ways to get data into Solr. The process of doing this is referred to as indexing, although the term importing is also used.

This chapter is structured as follows:

  • Communicating with Solr

  • Sending data using Solr's Update-XML, JSON, and CSV formats

  • Commit, optimize, and rollback the transaction log

  • Atomic updates and optimistic concurrency

  • Importing content from a database or XML using Solr's DataImportHandler (DIH)

  • Extracting text from rich documents through Solr's ExtractingRequestHandler (also known as Solr Cell)

  • Post-processing documents with UpdateRequestProcessors

You will also find some related options in Chapter 9, Integrating Solr, that have to do with language bindings and framework integration, including a web crawler. Most use Solr's Update-XML format.

Tip

In a hurry?

There are many approaches to indexing, and you don't need to be well versed in all of them. The section on commit and optimize is important for everyone because...