Book Image

Apache Solr Beginner's Guide

By : Alfredo Serafini
Book Image

Apache Solr Beginner's Guide

By: Alfredo Serafini

Overview of this book

<p>With over 40 billion web pages, the importance of optimizing a search engine's performance is essential.<br /><br />Solr is an open source enterprise search platform from the Apache Lucene project. Full-text search, faceted search, hit highlighting, dynamic clustering, database integration, and rich document handling are just some of its many features. Solr is highly scalable thanks to its distributed search and index replication.<br /><br />Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Apache Tomcat or Jetty. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it usable with most popular programming languages. Solr's powerful external configuration allows it to be tailored to many types of application without Java coding, and it has a plugin architecture to support more advanced customization.<br /><br />With Apache Solr Beginner's Guide you will learn how to configure your own search engine experience. Using real data as an example, you will have the chance to start writing step-by-step, simple, real-world configurations and understand when and where to adopt this technology.<br /><br />Apache Solr Beginner's Guide will start by letting you explore a simple search over real data. You will then go through a step-by-step description that gives you the chance to explore several practical features. At the end of the book you will see how Solr is used in different real-world contexts.<br /><br />Using data from public domains like DBpedia, you will define several different configurations, exploring some of the most interesting Solr features, such as faceted search and navigation, auto-suggestion, and rich document indexing. You will see how to configure different analysers for handling different data types, without programming.<br /><br />You will learn the basics of Solr, focusing on real-world examples and practical configurations.</p>
Table of Contents (19 chapters)
Apache Solr Beginner's Guide
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Index

Chapter 2, Indexing with Local PDF Files


Pop quiz

Q1

1

False

2

False

3

True

Q2

1

False: You can, for example, put values from an administrative metadata in a stored field in order to have them saved and returned in the results, without actually the need to perform searches on them.

2

True: This is the reason why we decide to use an indexed field.

3

False: A field must be stored to be returned in the results.

Q3

1

False: This query will simply delete every document.

2

False: The syntax is not correct.

3

True: This is the correct syntax.

Q4

1

False: This particular codec only partially uses binary format, and it exposes most of the data in plain text.

2

True: Looking at the plain text structure saved, we can recognize the internal structure of an inverted index, and make an idea of how it's made.

3

True: The values are saved as plain text, so they are easy to read.

Q5

1

False: The files saved reflects the changes in the index.

2

False: What we mean to be "a word" can be composed by one or more tokens, depending on the chosen text analysis chain. Every token will be saved as a single term.

3

True: Every term will be saved and updated with its reference.

Q6

1

True: The number of segments should vary depending on the action you do on the index. Note that in some circumstances, imagine for example you ask to clean an already empty index, the number of segments will not vary at all, but if you look at the time of last modification, you'll easily see that the files are updated as well.

2

False: Even while cleaning an index, not all segments files are deleted: there will be always at least one file which represents a created index.

3

False: See the previous answers. Furthermore, the core/data folder can contain other files needed for specific components, such as compiled dictionary for spell checking.

Q7

1

True

2

True

3

False: It is partially true, as we can use DataImportHandler and connect it to a specific handler, but we will change the configuration for the DataImportHandler itself, and not for an update handler.

Q8

1

True

2

False: We can use the Tika configurations.

3

False: We can change the configurations, but we can also send the added metadata by appending a parameter in the URL.