Mastering Apache Solr 7.x

Mastering Apache Solr 7.x

By : Sandeep Nair, Chintan Mehta, Dharmesh Vasoya

Buy this Book

Mastering Apache Solr 7.x

By: Sandeep Nair, Chintan Mehta, Dharmesh Vasoya

Buy this Book

Overview of this book

Apache Solr is the only standalone enterprise search server with a REST-like application interface. providing highly scalable, distributed search and index replication for many of the world's largest internet sites. To begin with, you would be introduced to how you perform full text search, multiple filter search, perform dynamic clustering and so on helping you to brush up the basics of Apache Solr. You will also explore the new features and advanced options released in Apache Solr 7.x which will get you numerous performance aspects and making data investigation simpler, easier and powerful. You will learn to build complex queries, extensive filters and how are they compiled in your system to bring relevance in your search tools. You will learn to carry out Solr scoring, elements affecting the document score and how you can optimize or tune the score for the application at hand. You will learn to extract features of documents, writing complex queries in re-ranking the documents. You will also learn advanced options helping you to know what content is indexed and how the extracted content is indexed. Throughout the book, you would go through complex problems with solutions along with varied approaches to tackle your business needs. By the end of this book, you will gain advanced proficiency to build out-of-box smart search solutions for your enterprise demands.

Title Page

Packt Upsell

Contributors

Preface

Free Chapter

Introduction to Solr 7

Introduction to Solr

Why choose Solr?

Solr use cases

What's new in Solr 7?

Summary

Getting Started

Solr installation

Understanding various files and the folder structure

Running Solr

Loading sample data

Understanding the browse interface

Using the Solr admin interface

Summary

Designing Schemas

How Solr works

Understanding field types

Field management

Mastering Schema API

Deciphering schemaless mode

Summary

Mastering Text Analysis Methodologies

Understanding text analysis

Understanding analyzer

Understanding tokenizers

Understanding filters

Understanding multilingual analysis

Understanding phonetic matching

Summary

Data Indexing and Operations

Basics of Solr indexing

Understanding index handlers

Apache Tika and indexing

Language detection

Client APIs

Summary

Advanced Queries – Part I

Search relevance

Velocity search UI

Query parsing and syntax

Response writer

Faceting

Highlighting

Summary

Advanced Queries – Part II

Summary

Managing and Fine-Tuning Solr

JVM configuration

Managing solrconfig.xml

Managing backups

JMX with Solr

Logging configuration

SolrCloud overview

Enabling SSL – Solr security

Performance statistics

Summary

Client APIs – An Overview

Client API overview

JavaScript Client API

SolrJ Client API

Ruby Client API

Python Client API

Summary

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Apache Tika and indexing

We have seen how to index data from a standard file format such as JSON or XML. But what about proprietary file formats such as Word and PDF? Luckily, Solr comes to the rescue with the use of the Apache Tika project. The Tika framework provides a way to incorporate various file formats such as Word and PDF.

Internally, Tika uses the Apache PDFBox parser to parse PDF and Apache POI for the Word format. Solr provides ExtractingRequestHandler, which makes use of Tika to upload binary files and to index as well as extract data.

This framework in Solr is known as Solr Cell, which is an abbreviation of Solr content extraction library, the name when this framework was under development.

Solr Cell basics

As we have earlier seen that, the Solr Cell framework leverages the Tika framework. Let's look at some basic concepts about this.

Please specify the MIME type for Tika explicitly to specify the document type. This has to be done with the stream.type parameter or else Tika will...

Mastering Apache Solr 7.x

By : Sandeep Nair, Chintan Mehta, Dharmesh Vasoya

Mastering Apache Solr 7.x

By: Sandeep Nair, Chintan Mehta, Dharmesh Vasoya

Overview of this book

Related Content you might be interested in

Current Title:

Mastering Apache Solr 7.x

Elasticsearch 7 Quick Start Guide

Mastering Elasticsearch 5.x

Apache Tika and indexing

Solr Cell basics