Scaling Apache Solr

Book Image

Scaling Apache Solr

By : Hrishikesh Vijay Karambelkar

Book Image

Scaling Apache Solr

By: Hrishikesh Vijay Karambelkar

Overview of this book

Scaling Apache Solr

Scaling Apache Solr

Credits

About the Author

About the Author

About the Reviewers

About the Reviewers

www.PacktPub.com

www.PacktPub.com

Preface

Free Chapter

Understanding Apache Solr

Understanding Apache Solr

Challenges in enterprise search

Apache Solr – an overview

Features of Apache Solr

Apache Solr architecture

Practical use cases for Apache Solr

Getting Started with Apache Solr

Getting Started with Apache Solr

Setting up Apache Solr

Understanding the Solr structure

Configuring the Apache Solr for enterprise

Understanding SolrJ

Analyzing Data with Apache Solr

Analyzing Data with Apache Solr

Understanding enterprise data

Loading data using native handlers

Working with rich documents

Importing structured data from the database

Advanced topics with Solr

Designing Enterprise Search

Designing Enterprise Search

Designing aspects for enterprise search

Enterprise search data-processing patterns

Data integrating pattern for search

Case study – designing an enterprise knowledge repository search for software IT services

Integrating Apache Solr

Integrating Apache Solr

Empowering the Java Enterprise application with Solr search

Integration with client technologies

Case study – Apache Solr and Drupal

Distributed Search Using Apache Solr

Distributed Search Using Apache Solr

Need for distributed search

Understanding SolrCloud

Building enterprise distributed search using SolrCloud

Common problems and resolutions

Case study – distributed enterprise search server for the software industry

Scaling Solr through Sharding, Fault Tolerance, and Integration

Scaling Solr through Sharding, Fault Tolerance, and Integration

Enabling search result clustering with Carrot2

Sharding and fault tolerance

Searching Solr documents in near real time

Solr with MongoDB

Scaling Solr through Storm

Scaling Solr through High Performance

Scaling Solr through High Performance

Monitoring performance of Apache Solr

Tuning Solr JVM and container

Optimizing Solr schema and indexing

Speeding Solr through Solr caching

Improving runtime search for Solr

Optimizing SolrCloud

Solr and Cloud Computing

Solr and Cloud Computing

Enterprise search on Cloud

Solr on Cloud strategies

Running Solr on Cloud (IaaS and PaaS)

Running Solr on Cloud (SaaS) and enterprise search as a service

Scaling Solr Capabilities with Big Data

Scaling Solr Capabilities with Big Data

Apache Solr and HDFS

Big Data search on Katta

Using the Solr 1045 patch – map-side indexing

Using the Solr 1301 patch – reduce-side indexing

Apache Solr and Cassandra

Advanced analytics with Solr

Sample Configuration for Apache Solr

Sample Configuration for Apache Solr

Index

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Working with rich documents

We have seen how Apache Solr has inbuilt handlers for CSV, JSON, and XML formats in the last section. In any content management system of an organization, a data item may be residing in documents which are in different formats, such as PDF, DOC, PPT, XLS. The biggest challenge with these types is, they are all semi-structured forms. Interestingly, Apache Solr handles many of these formats directly, and it is capable of extracting the information from these types of data sources, thanks to Apache Tika! Apache Solr uses code from the Apache Tika project to provide a framework for incorporating many different file-format parsers such as Apache PDFBox and Apache POI into Solr itself.

Note

The framework to extract content from different data sources in Apache Solr is also called Solr CEL, solr-cell or more commonly Solr Cell.

Understanding Apache Tika

Apache Tika is a SAX-based parser for extracting the metadata from different types of documents. Apache Tika uses the org...