Index
A
- Apache Ambari
- about / Apache Ambari
- Apache Avro
- about / Apache Avro, The architecture
- Apache Flume
- about / Apache Flume
- Apache Hadoop
- Apache HBase
- about / Apache HBase
- Apache HCatalog
- about / Apache HCatalog
- Apache Hive
- about / Apache Hive
- Apache Lucene
- about / Understanding the limits
- Apache Mahout
- about / Apache Mahout
- Apache Pig
- about / Apache Pig
- Apache Solr
- about / The problem
- benefits / The problem
- issues / The problem
- Apache Solr instance
- setting up / Setting up the Apache Solr instance
- Apache Solr search
- configuring / Configuring Apache Solr search
- schema, defining for instance / Defining a Schema for your instance
- Solr instance, configuring / Configuring a Solr instance
- request handlers / Request handlers and search components
- search components / Request handlers and search components
- facets / Facet
- MoreLikeThis component / MoreLikeThis
- highlight search component / Highlight
- SpellCheck component / SpellCheck
- metadata management / Metadata management
- Apache Sqoop
- about / Apache Sqoop
- Apache Tika / The query parser
- Apache Zookeeper
- about / Apache ZooKeeper
- AP system
- about / The CAP theorem
- architecture, distributed search / Distributed search architecture
- architecture, HDFS / HDFS architecture
- NameNode / NameNode
- DataNode / DataNode
- Secondary NameNode / Secondary NameNode
- architecture, Katta / Katta architecture
- architecture, Lily / The architecture
- Write-Ahead Log (WAL) / Write-ahead Logging
- message queue / The message queue
- querying / Querying using Lily
- records, updating / Updating records using Lily
- architecture, Map-Reduce
- about / MapReduce architecture
- JobTracker / MapReduce architecture, JobTracker
- TaskTracker / MapReduce architecture, TaskTracker
- architecture, Solr
- about / Apache Solr architecture
- storage / Storage
- architecture, SolrCloud / SolrCloud architecture
- autoCommit directive / Configuration files
B
- Big Data storage
- Solr, using for / How Solr can be used for Big Data storage?
- Brewer's theorem
- about / The CAP theorem
C
- Cache Autowarming
- about / Optimizing the Solr cache
- capacity-scheduler.xml / Hadoop configuration
- CAP theorem
- about / The CAP theorem
- NOSQL database / What is a NOSQL database?
- CA system
- about / The CAP theorem
- CDH
- about / Apache Flume
- checkpoints
- about / Secondary NameNode
- client APIs, Solr engine / Client APIs and SolrJ client
- Cloudera
- about / Apache Flume
- collection
- collections
- creating, in SolrCloud / Creating shards, collections, and replicas in SolrCloud
- column store, NOSQL database / The key-value store or column store
- commit console, SolrMeter / Using SolrMeter
- commit operation
- about / When to commit changes?
- performing / When to commit changes?
- common-logging.properties / Hadoop configuration
- components, Apache Hadoop
- HDFS / Understanding Apache Hadoop and its ecosystem
- MapReduce framework / Understanding Apache Hadoop and its ecosystem
- Apache HBase / Apache HBase
- Apache Pig / Apache Pig
- Apache Hive / Apache Hive
- Apache Zookeeper / Apache ZooKeeper
- Apache Mahout / Apache Mahout
- Apache HCatalog / Apache HCatalog
- Apache Ambari / Apache Ambari
- Apache Avro / Apache Avro
- Apache Sqoop / Apache Sqoop
- Apache Flume / Apache Flume
- concurrent clients
- optimizing / Optimizing concurrent clients
- configuration, Apache Solr search / Configuring Apache Solr search
- configuration, Katta cluster / Configuring Katta cluster
- configuration, search schema fields / Configuring search schema fields
- configuration, SolrCloud / Configuring SolrCloud
- configuration, Solr instance / Configuring a Solr instance
- configuration files, Solr
- solrconfig.xml / Storage
- schema.xml / Storage
- solr.xml / Storage
- about / Configuration files
- container
- optimizing / Optimizing the container
- core-site.xml / Hadoop configuration
- CP system
- about / The CAP theorem
- CSVDocumentConverter class
- CSVIndexer class
- CSVMapper class
- CSVReducer class
- curl utility / Installing Solr
- currency.txt / Metadata management
- custom partitioning
D
- data
- organizing / Organizing data
- loading, for search / Loading your data for search
- dataDir directive / Configuration files
- Data Import Handler (DIH) / The query parser, Loading your data for search
- DataNode
- about / DataNode
- data processing workflows
- about / Understanding data-processing workflows
- standalone machine / The standalone machine
- distributed setup / Distributed setup
- replicated mode / The replicated mode
- sharded mode / The sharded mode
- DDL (Data Definition Language)
- about / Apache HCatalog
- default search field
- specifying / Specifying the default search field
- DisMaxQueryParser
- about / SolrJ
- DisMaxRequestHandler / The query parser
- distributed deadlock
- about / Understanding the limits
- distributed search
- SolrCloud, using for / Using SolrCloud for distributed search
- about / Understanding the concepts of distributed search
- architecture / Distributed search architecture
- scenarios / Distributed search scenarios
- distributed search, Apache Solr
- limitations / Understanding the limits
- distributed setup, data processing workflows / Distributed setup
- distributed shard
- document, adding to / Adding a document to the distributed shard
- document
- about / The document-oriented store
- adding, to distributed shard / Adding a document to the distributed shard
- document-oriented store, NOSQL database / The document-oriented store
- document cache, Solr cache optimization / The document cache
E
- e-commerce websites
- about / E-commerce websites
- benefits / E-commerce websites
- elevate.txt / Metadata management
- Ephemeral node
- about / The sharding algorithm
- ETL (Extract-Transform-Load)
- about / Apache Flume
- ExtendedDisMaxQueryParser
- about / SolrJ
F
- faceted browsing / The query parser
- facets, Apache Solr search / Facet
- Fair-scheduler.xml / Hadoop configuration
- field value cache, Solr cache optimization / The field value cache
- filter cache, Solr cache optimization / The filter cache
- filter directive / Configuration files
- filter queries
- search runtime, optimizing / Filter queries
G
- graph database, NOSQL database / The graph database
H
- Hadoop
- operations / Accessing HDFS
- installing / Installing and running Hadoop
- running / Installing and running Hadoop
- prerequisites / Prerequisites
- installing, on machines / Installing Hadoop on machines
- URL / Installing Hadoop on machines
- program, running / Running a program on Hadoop
- search, optimizing / Optimizing search on Hadoop
- Hadoop-env.sh / Hadoop configuration
- Hadoop-policy.xml / Hadoop configuration
- Hadoop cluster
- managing / Managing a Hadoop cluster
- Hadoop configuration
- about / Hadoop configuration
- core-site.xml / Hadoop configuration
- hdfs-site.xml / Hadoop configuration
- mapred-site.xml / Hadoop configuration
- common-logging.properties / Hadoop configuration
- capacity-scheduler.xml / Hadoop configuration
- Fair-scheduler.xml / Hadoop configuration
- Hadoop-env.sh / Hadoop configuration
- Hadoop-policy.xml / Hadoop configuration
- Masters/slaves / Hadoop configuration
- Log4j.properties / Hadoop configuration
- Hadoop data analysis
- MapReduce, creating for / Creating MapReduce to analyze Hadoop data
- HBase / The architecture
- HDFS
- large data, storing / Storing large data in HDFS
- architecture / HDFS architecture
- objectives / HDFS architecture
- accessing / Accessing HDFS
- HDFS-APIs
- about / Accessing HDFS
- hdfs-site.xml / Hadoop configuration
- highlight search component, Apache Solr search / Highlight
- Hunspell algorithm
- about / Stemming
I
- indexConfig directive / Configuration files
- indexes
- creating, for Katta / Katta
- index handler / The query parser
- indexing / Storage
- indexing buffer size
- limiting / Limiting the indexing buffer size
- index merge
- optimizing / Optimizing the index merge
- index optimization
- about / Index optimization
- indexing buffer size, limiting / Limiting the indexing buffer size
- commit operation, performing / When to commit changes?
- index merge, optimizing / Optimizing the index merge
- optimize option, for index merging / Optimize an option for index merging
- container, optimizing / Optimizing the container
- concurrent clients, optimizing / Optimizing concurrent clients
- Java Virtual Machine (JVM), optimizing / Optimizing the Java virtual memory
- index partitioning, Apache Solr
- simple partitioning / Deep dive – shards and indexing data of Apache Solr
- prefix-based partitioning / Deep dive – shards and indexing data of Apache Solr
- custom partitioning / Deep dive – shards and indexing data of Apache Solr
- index reader / The query parser
- installation, Hadoop / Installing and running Hadoop
- installation, Lily / Installing and running Lily
- installation, Solr / Installing Solr
- interaction, Solr engine / Interaction
- interfaces, Solr engine / Other interfaces
J
- Java Virtual Machine (JVM)
- optimizing / Optimizing the Java virtual memory
- JConsole / Monitoring the Solr instance
- JCR (Java Content Repository)
- about / The architecture
- Jmx directive / Configuration files
- JobTracker
- about / JobTracker
- JVisualVM / Monitoring the Solr instance
K
- Katta
- about / Using Katta for Big Data search (Solr-1395 patch), Katta
- architecture / Katta architecture
- benefits / Benefits
- drawbacks / Drawbacks
- indexes, creating for / Katta
- Katta cluster
- configuring / Configuring Katta cluster
- Katta indexes
- creating / Creating Katta indexes
- key-value store, NOSQL database / The key-value store or column store
- KStem algorithm
- about / Stemming
L
- laggard problem
- about / Understanding the limits
- large data
- storing, in HDFS / Storing large data in HDFS
- lazy field loading, Solr cache optimization / Lazy field loading
- lib directive / Configuration files
- Lily
- about / Lily – running Solr and Hadoop together
- architecture / The architecture
- used, for running user query / Querying using Lily
- used, for updating records / Updating records using Lily
- installing / Installing and running Lily
- running / Installing and running Lily
- Lily Data Repository (Lily DR)
- about / The architecture
- Listener directive / Configuration files
- lockType directive / Configuration files
- Log4j.properties / Hadoop configuration
- log management, for banking
- about / Log management for banking
- issues / The problem
- issues, tackling / How can it be tackled?
- high-level design / High-level design
- luceneMatchVersion directive / Configuration files
- LucidWorks
- URL / Installing Solr
M
- Map-Reduce
- architecture / MapReduce architecture
- map-side indexing / Using Solr 1045 patch – map-side indexing
- mapred-site.xml / Hadoop configuration
- MapReduce
- about / Understanding Apache Hadoop and its ecosystem
- creating, for Hadoop data analysis / Creating MapReduce to analyze Hadoop data
- MapReduce approach
- MapReduce program
- Solr-1045 patch / The Solr-1045 patch – map program
- Solr-1301 / The Solr-1301 patch – reduce-side indexing
- Map Task
- Masters/slaves / Hadoop configuration
- maxBufferedDocs directive / Configuration files
- maxIndexingThreads directive / Configuration files
- message queue / The message queue
- metadata management, Apache Solr search / Metadata management
- MongoDB / How Solr can be used for Big Data storage?
- MoreLikeThis component, Apache Solr search / MoreLikeThis
- multi-core Solr search
- using, on SolrCloud / Using multicore Solr search on SolrCloud
N
- NameNode
- about / NameNode
- NOSQL database
- key-value store / The key-value store or column store
- column store / The key-value store or column store
- document-oriented store / The document-oriented store
- graph database / The graph database
- NOSQL databases
O
- OCR
- optimize console, SolrMeter / Using SolrMeter
- optimize option
- for index merging / Optimize an option for index merging
P
- Pig Latin
- about / Apache Pig
- pipeline-based workflow
- about / Understanding data-processing workflows
- advantages / Understanding data-processing workflows
- Porter algorithm
- about / Stemming
- prefix-based partitioning
- program
- running, on Hadoop / Running a program on Hadoop
- protwords.txt / Metadata management, protwords.txt
Q
- query console, SolrMeter / Using SolrMeter
- Query directive / Configuration files
- query parser, Solr engine / The query parser
- queryParser directive / Configuration files
- queryResponseWriter directive / Configuration files
- query result cache, Solr cache optimization / The query result cache
R
- ramBufferSizeMB directive / Configuration files
- records
- updating, Lily used / Updating records using Lily
- RecordWriter
- Reduce Tasks
- replicas
- creating, in SolrCloud / Creating shards, collections, and replicas in SolrCloud
- replicated mode, data processing workflows / The replicated mode
- requestDispatcher directive / Configuration files
- requestHandler directive / Configuration files
- request handlers, Apache Solr search / Request handlers and search components
- Response Writer / The query parser
S
- schema.xml / Storage, schema.xml
- search
- data, loading for / Loading your data for search
- optimizing, on Hadoop / Optimizing search on Hadoop
- searchComponent directive / Configuration files
- search components, Apache Solr / Request handlers and search components
- search query
- search runtime, optimizing / Optimizing through search queries
- search runtime
- optimizing / Optimization the search runtime
- optimizing, through search query / Optimizing through search queries
- optimizing, through filter queries / Filter queries
- search schema
- optimizing / Optimizing the search schema
- search schema fields
- configuring / Configuring search schema fields
- search schema optimization
- default search field, specifying / Specifying the default search field
- search schema fields, configuring / Configuring search schema fields
- stop words / Stop words
- stemming / Stemming
- Secondary NameNode
- about / Secondary NameNode
- sharded mode, data processing workflows / The sharded mode
- sharding
- Sharding algorithm
- about / The sharding algorithm
- shards
- about / Understanding data-processing workflows
- creating, in SolrCloud / Creating shards, collections, and replicas in SolrCloud
- simple partitioning
- Snowball algorithm
- about / Stemming
- Solr
- installing / Installing Solr
- architecture / Apache Solr architecture
- using, for Big Data storage / How Solr can be used for Big Data storage?
- Solr-1045 patch
- about / Using Solr 1045 patch – map-side indexing, The Solr-1045 patch – map program
- using / Using Solr 1045 patch – map-side indexing
- URL, for downloading / Using Solr 1045 patch – map-side indexing
- benefits / Benefits
- drawbacks / Drawbacks
- Solr-1301
- about / The Solr-1301 patch – reduce-side indexing
- used, for reduce-side indexing / The Solr-1301 patch – reduce-side indexing
- solr.war / Installing Solr
- solr.xml / Storage
- solr.xml file / Configuration files
- Solr 1301 patch
- using / Using Solr 1301 patch – reduce-side indexing
- running / Using Solr 1301 patch – reduce-side indexing
- benefits / Benefits
- drawbacks / Drawbacks
- Solr cache
- optimizing / Optimizing the Solr cache
- Solr cache optimization
- about / Optimizing the Solr cache
- filter cache / The filter cache
- query result cache / The query result cache
- document cache / The document cache
- field value cache / The field value cache
- lazy field loading / Lazy field loading
- Solr Cell
- SolrCloud
- about / Using SolrCloud for distributed search
- using, for distributed search / Using SolrCloud for distributed search
- architecture / SolrCloud architecture
- configuring / Configuring SolrCloud
- multi-core Solr search, using on / Using multicore Solr search on SolrCloud
- benefits / Benefits
- drawbacks / Drawbacks
- configuring, for large indexes / Configuring SolrCloud to work with large indexes
- shards, creating / Creating shards, collections, and replicas in SolrCloud
- collections, creating / Creating shards, collections, and replicas in SolrCloud
- replicas, creating / Creating shards, collections, and replicas in SolrCloud
- solrconfig.xml / Storage, solrconfig.xml
- solrconfig.xml file / Configuration files
- SolrDocumentConverter class
- Solr engine
- about / Solr engine
- query parser / The query parser
- interaction / Interaction
- client APIs / Client APIs and SolrJ client
- SolrJ client / Client APIs and SolrJ client
- interfaces / Other interfaces
- SolrIndexUpdateMapper class / Using Solr 1045 patch – map-side indexing
- SolrIndexUpdater class / Using Solr 1045 patch – map-side indexing
- Solr instance
- configuring / Configuring a Solr instance
- monitoring / Monitoring the Solr instance
- SolrJ
- about / SolrJ
- SolrJ client, Solr engine / Client APIs and SolrJ client
- SolrMeter
- about / Using SolrMeter
- using / Using SolrMeter
- query console / Using SolrMeter
- update console / Using SolrMeter
- commit console / Using SolrMeter
- optimize console / Using SolrMeter
- SolrOutputFormat class
- SolrRecordWriter class
- SolrXMLDocRecordReader class / Using Solr 1045 patch – map-side indexing
- spellcheck component, Apache Solr search / SpellCheck
- spellings.txt / Metadata management, spellings.txt
- ssh
- setting up, without passphrase / Setting up SSH without passphrases
- standalone machine, data processing workflows / The standalone machine
- stemming
- about / Stemming
- stemming algorithms
- stop words
- about / Stop words
- stopwords.txt / Metadata management, stopwords.txt
- storage, Apache Solr / Storage
- synonyms.txt / Metadata management, synonyms.txt
T
- TaskTracker
- about / TaskTracker
U
- unlockOnStartup directive / Configuration files
- update console, SolrMeter / Using SolrMeter
- updateHandler directive / Configuration files
- updateLog directive / Configuration files
- updateRequestProcessor chain / Configuration files
- user query
- running, Lily used / Querying using Lily
W
- Write-Ahead Log (WAL)
- about / Write-ahead Logging
- writeLockTimeout directive / Configuration files
Z
- znodes
- about / The sharding algorithm
- ZooKeeper ensemble
- setting up / Setting up the ZooKeeper ensemble