We have seen how Apache Solr has inbuilt handlers for CSV, JSON, and XML formats in the last section. In any content management system of an organization, a data item may be residing in documents which are in different formats, such as PDF, DOC, PPT, XLS. The biggest challenge with these types is, they are all semi-structured forms. Interestingly, Apache Solr handles many of these formats directly, and it is capable of extracting the information from these types of data sources, thanks to Apache Tika! Apache Solr uses code from the Apache Tika project to provide a framework for incorporating many different file-format parsers such as Apache PDFBox and Apache POI into Solr itself.
Scaling Apache Solr
By :
Scaling Apache Solr
By:
Overview of this book
Table of Contents (18 chapters)
Scaling Apache Solr
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Free Chapter
Understanding Apache Solr
Getting Started with Apache Solr
Analyzing Data with Apache Solr
Designing Enterprise Search
Integrating Apache Solr
Distributed Search Using Apache Solr
Scaling Solr through Sharding, Fault Tolerance, and Integration
Scaling Solr through High Performance
Solr and Cloud Computing
Scaling Solr Capabilities with Big Data
Sample Configuration for Apache Solr
Index
Customer Reviews