Book Image

Scaling Big Data with Hadoop and Solr, Second Edition

By : Hrishikesh Vijay Karambelkar
Book Image

Scaling Big Data with Hadoop and Solr, Second Edition

By: Hrishikesh Vijay Karambelkar

Overview of this book

Table of Contents (13 chapters)
Scaling Big Data with Hadoop and Solr Second Edition
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Loading data in Apache Solr


Once Apache Solr is configured, the next step is to load data in Apache Solr and run queries. There are different ways to load data into Apache Solr. The following diagram depicts most of the used ones:

We have already seen the simple post tool earlier while setting up Apache Solr. We are going to understand Extracting Request Handler.

Extracting request handler – Solr Cell

Solr Cell is one of the most powerful handlers for uploading any type of data. This is particularly useful if you wish to run Solr on a set of files/unstructured data containing different formats such as office, pdf, eBook, emails, and text. In Apache Tika, text extraction is based purely on file type and content. So, if you have a PDF of scanned images containing text, Apache Tika won't be able to extract any of the text from it. In such cases, you need to use OCR-based software to bring in such functionality for Solr. You can simply try this by downloading the curl utility and then by running...