Book Image

Elasticsearch Indexing

By : Huseyin Akdogan
Book Image

Elasticsearch Indexing

By: Huseyin Akdogan

Overview of this book

Beginning with an overview of the way ElasticSearch stores data, you’ll begin to extend your knowledge to tackle indexing and mapping, and learn how to configure ElasticSearch to meet your users’ needs. You’ll then find out how to use analysis and analyzers for greater intelligence in how you organize and pull up search results – to guarantee that every search query is met with the relevant results! You’ll explore the anatomy of an ElasticSearch cluster, and learn how to set up configurations that give you optimum availability as well as scalability. Once you’ve learned how these elements work, you’ll find real-world solutions to help you improve indexing performance, as well as tips and guidance on safety so you can back up and restore data. Once you’ve learned each component outlined throughout, you will be confident that you can help to deliver an improved search experience – exactly what modern users demand and expect.
Table of Contents (15 chapters)
Elasticsearch Indexing
Credits
About the Author
About the Reviewer
www.PacktPub.com
Preface
Index

Index

A

  • analysis
    • about / Analysis, Introducing analysis
    • examining / Analysis
    • process / Process of analysis
    • tokenizing / Process of analysis
    • normalizing / Process of analysis
  • analyzer
    • specifying, for field in mapping / Specifying the analyzer for a field in the mapping
    • custom analyzer, creating / Creating a custom analyzer
  • analyzer pipeline
    • about / An Analyzer Pipeline
  • Apache Lucene
    • about / Understanding the document storage strategy, Indices
    • URL / Understanding the document storage strategy
  • ASCII Folding Token Filter
    • about / Token filters
  • ASCII Folding token filter
    • about / ASCII Foldng Token filter
  • attachment type
    • about / Attachment type
    • reference link / Attachment type
  • AWS Cloud Plugin
    • about / Cloud repository
    • URL / Cloud repository
  • Azure Cloud Plugin
    • URL / Cloud repository
    • about / Cloud repository

B

  • bool query
    • using / Bool query
  • built-in analyzers
    • about / Built-in analyzers
    • Standard Analyzer / Built-in analyzers
    • Simple Analyzer / Built-in analyzers
    • Whitespace Analyzer / Built-in analyzers
    • Stop Analyzer / Built-in analyzers
    • Pattern Analyzer / Built-in analyzers
    • Language Analyzer / Built-in analyzers
    • building blocks / Building blocks of Analyzer
    • character filters / Characte filters
    • tokenizer / Tokenizer
    • token filters / Token filters
  • bulk API
    • about / Bulk API
  • bulk sizing
    • about / Bulk sizing

C

  • character filters
    • about / Characte filters
    • HTML Strip Char Filter / HTML Strip Char filter
    • Pattern Replace Char Filter / Pattern Replace Char filter
  • client nodes
    • about / Client nodes
  • cloud repository
    • about / Cloud repository
  • completion suggester
    • used, for correcting users spelling mistakes / The completion suggester
    • configuration, mapping / Mapping the configuration for the completion suggester
    • completion field, indexing / Indexing on completion field
  • Concurrent Mark Sweep garbage collector
    • about / Concurrent Mark Sweep garbage collector
  • configuration, for high performance indexing
    • performing / Configuration
    • memory configuration / Memory configuration
    • swapping, avoiding / Avoiding swapping
    • garbage collector / Garbage collector
    • JVM memory / The structure of JVM memory
    • file descriptors / File descriptors
  • custom analyzer
    • creating / Creating a custom analyzer

D

  • database
    • about / Understanding the document storage strategy
  • dedicated master nodes
    • about / Dedicated master nodes
  • denormalization
    • about / Denormalization
  • document
    • about / Document
    • inverted index / Inverted index
  • document-oriented search engine
    • about / Understanding the document storage strategy
  • document storage
    • about / Understanding the document storage strategy
    • _source field / The _source field
    • storable field, versus searchable field / The difference between the storable and searchable field

E

  • Elasticsearch cluster
    • about / Basic concepts
    • architecture, of distribution / Explaining the architecture of distribution
    • configuring / Correctly configuring the cluster
  • ES_HEAP_SIZE environment variable
    • about / The ES_HEAP_SIZE environment variable

F

  • file descriptors
    • about / File descriptors
    • FD limit, increasing on Unix systems / Increasing FD limit on Unix systems
  • Finite State Transducer (FST) data structure
    • about / The completion suggester
    • URL / The completion suggester

G

  • G1 garbage collector
    • about / G1 garbage collector
  • garbage collection
    • monitoring / Monitoring garbage collection
    • tuning / Tuning the garbage collection
  • garbage collector
    • about / Garbage collector
    • strategies / Different strategies among garbage collectors
    • serial garbage collector / Serial garbage collector
    • parallel garbage collector / Parallel garbage collector
    • Concurrent Mark Sweep garbage collector / Concurrent Mark Sweep garbage collector
    • G1 garbage collector / G1 garbage collector

H

  • HDFS filesystem repository
    • about / HDFS filesystem repository
    • URL / HDFS filesystem repository
  • HTML Strip Char Filter
    • about / HTML Strip Char filter
  • hybrid filesystem store
    • about / Hybrid filesystem store

I

  • I/O operations
    • throttling / Throttling I/O operations
    • throttling type, configuring / Throttling type
  • ICU Analysis
    • about / ICU analysis plugin
    • reference link / ICU analysis plugin
    • ASCII Folding token filter / ASCII Foldng Token filter
  • indices
    • about / Indices
    • mapping / Mapping
    • types / Types
  • inverted index
    • about / Understanding the document storage strategy, Indices, Inverted index

J

  • Java FileChannel Class
    • URL / New IO filesystem store
  • Java garbage collection
    • about / Garbage collector
  • Java RandomAccessFile Class
    • URL / Simple filesystem store
  • JavaScript Object Notation (JSON)
    • about / Indices
    • reference link / Indices
  • JConsole
    • URL / Monitoring garbage collection
  • jstat command
    • URL / Monitoring garbage collection
  • JVM memory
    • structure / The structure of JVM memory
    • Eden Space / The structure of JVM memory
    • Survivor Space / The structure of JVM memory
    • Tenured Generation / The structure of JVM memory
    • Permanent Generation / The structure of JVM memory
    • Code Cache / The structure of JVM memory
    • problem / What is the problem?
    • garbage collection, monitoring / Monitoring garbage collection
    • VisualVM / VisualVM
    • garbage collectors, strategies / Different strategies among garbage collectors
    • deallocating / Process of deallocating memory
    • garbage collector / Types of garbage collector

L

  • Language Analyzer
    • about / Built-in analyzers
  • Length Token Filter
    • about / Token filters
  • Letter Tokenizer
    • about / Tokenizer
  • log_byte_size policy
    • about / log_byte_size policy
    • settings / log_byte_size policy
  • log_doc policy
    • about / Log_doc policy
    • settings / Log_doc policy
  • Lowercase Token Filter
    • about / Token filters
  • Lucene MMapDirectory
    • URL / MMap filesystem store
  • Lucene NIOFSDirectory
    • URL / New IO filesystem store
  • Lucene SimpleFSDirectory
    • URL / Simple filesystem store

M

  • major GC
    • about / The structure of JVM memory
  • mapping
    • about / Mapping, Basic concepts and definitions
    • metadata fields / Metadata fields
    • and search results, relationship between / The relationship between mapping and relevant search results
    • analyzer, specifying for field / Specifying the analyzer for a field in the mapping
  • mapping definition
    • optimization / Optimization of mapping definition
    • norms / Norms
    • index_option of string type / Feature index_option of string type
    • unnecessary fields, excluding / Exclude unnecessary fields
    • automatic index refresh time, setting / Extension of the automatic index refresh time
  • memory configuration
    • about / Memory configuration
    • ES_HEAP_SIZE environment variable / The ES_HEAP_SIZE environment variable
  • merging policies
    • about / Segments and merging policies
    • selecting / Choosing the right merge policy
    • tiered policy / Tiered policy
    • log_byte_size policy / log_byte_size policy
    • log_doc policy / Log_doc policy
  • metadata fields
    • about / Metadata fields
    • _source / _source
    • _all / _all
    • _timestamp / _timestamp
    • _ttl / _ttl
  • minor GC
    • about / The structure of JVM memory
  • mlockall property
    • about / Mlockall property
  • MMap filesystem store
    • about / MMap filesystem store

N

  • n-gram language models
    • URL / The phrase suggester
  • new IO filesystem store
    • about / New IO filesystem store
  • NFC
    • about / What's text normalization?
  • NFD
    • about / What's text normalization?
  • NFKC
    • about / What's text normalization?
  • NFKD
    • about / What's text normalization?
  • node
    • about / Node
    • non-data nodes / Non-data nodes
    • tribe node / Tribe node
  • non-data nodes
    • dedicated master nodes / Dedicated master nodes
    • client nodes / Client nodes
  • Normalization Token Filters
    • about / Token filters
  • normalizing
    • about / Process of analysis

O

  • object type
    • about / Object type
    • root object type / Root object type
  • Old Generation
    • about / The structure of JVM memory
  • optimize API
    • about / The optimize API

P

  • parallel garbage collector
    • about / Parallel garbage collector
  • Path Hierarchy Tokenizer
    • about / Tokenizer
  • Pattern Analyzer
    • about / Built-in analyzers
  • Pattern Replace Char Filter
    • about / Pattern Replace Char filter
  • Pattern Tokenizer
    • about / Tokenizer
  • phrase suggester
    • used, for correcting users spelling mistakes / The phrase suggester
    • configuring / Configuring the phrase suggester

R

  • relevancy, of search results
    • improving / Improving the relevancy of search results
    • query, boosting / Boosting the query
    • bool query, using / Bool query
    • synonyms, using / Synonyms
    • _all field, using / Be careful about the _all field
  • replicas
    • about / Replicas
    • selecting / Choosing the right amount of shards and replicas
  • restore
    • about / Restore
    • index settings, overriding / Overriding index settings during restore
  • Reverse Token Filter
    • about / Token filters
  • root object type
    • about / Root object type

S

  • schema-less
    • about / Understanding the schema-less
  • search results
    • and mapping, relationship between / The relationship between mapping and relevant search results
    • relevancy, improving / Improving the relevancy of search results
  • segments
    • about / Segments and merging policies
    • optimize API / The optimize API
  • serial garbage collector
    • about / Serial garbage collector
  • sharding
    • about / Indices, Shards
  • shards
    • about / Indices, Shards
    • selecting / Choosing the right amount of shards and replicas
  • shared filesystem repository
    • about / Shared filesystem repository
  • Simple Analyzer
    • about / Built-in analyzers
  • simple filesystem store
    • about / Simple filesystem store
  • snapshot
    • about / Snapshot
    • process / How does the snapshot process works?
  • snapshot repository
    • about / Snapshot repository
    • types / Repository types
    • shared filesystem repository / Shared filesystem repository
    • URL repository / URL repository
    • cloud repository / Cloud repository
    • HDFS filesystem repository / HDFS filesystem repository
  • Standard Analyzer
    • about / Built-in analyzers
  • standard analyzer
    • about / Analysis
  • Standard Tokenizer
    • about / Tokenizer
  • Stop Analyzer
    • about / Built-in analyzers
  • Stop Token Filter
    • about / Token filters
  • storable field
    • versus searchable field / The difference between the storable and searchable field
  • store module
    • about / Store module
    • store types / Store types
  • store types
    • about / Store types
    • simple filesystem store / Simple filesystem store
    • new IO filesystem store / New IO filesystem store
    • MMap filesystem store / MMap filesystem store
    • hybrid filesystem store / Hybrid filesystem store
  • Suggest API
    • about / Correction of users' spelling mistakes
  • Suggesters
    • used, for correcting users spelling mistakes / Suggesters
  • suggestions
    • obtaining / Get suggestions
  • swapping
    • avoiding / Avoiding swapping
    • mlockall property / Mlockall property
  • synonyms
    • using / Synonyms

T

  • term suggester
    • used, for correcting users spelling mistakes / Term suggester
    • configuring / Configuring the term suggester
    • configuration options / Common suggest options
    • additional options / Other and additional term suggester options
  • text normalization
    • about / What's text normalization?
  • tiered policy
    • about / Tiered policy
    • settings / Tiered policy
  • token filters
    • about / Token filters
    • ASCII Folding Token Filter / Token filters
    • Length Token Filter / Token filters
    • Lowercase Token Filter / Token filters
    • Uppercase Token Filter / Token filters
    • Stop Token Filter / Token filters
    • Reverse Token Filter / Token filters
    • Trim Token Filter / Token filters
    • Normalization Token Filters / Token filters
  • tokenizer
    • about / Tokenizer
    • Standard Tokenizer / Tokenizer
    • Letter Tokenizer / Tokenizer
    • Whitespace Tokenizer / Tokenizer
    • Pattern Tokenizer / Tokenizer
    • UAX Email URL Tokenizer / Tokenizer
    • Path Hierarchy Tokenizer / Tokenizer
  • tokenizing
    • about / Process of analysis
  • tribe node
    • about / Tribe node
  • Trim Token Filter
    • about / Token filters
  • types
    • about / Types
    • object type / Object type
    • attachment type / Attachment type
  • types, indices
    • about / Types

U

  • UAX Email URL Tokenizer
    • about / Tokenizer
  • Unicode Consortium
    • about / Analysis
    • URL / Analysis
  • unicode normalization forms
    • URL / What's text normalization?
  • Unicode Standard Annex #29
    • URL / Tokenizer
  • Unix systems
    • FD limit, increasing on / Increasing FD limit on Unix systems
  • Uppercase Token Filter
    • about / Token filters
  • URL repository
    • about / URL repository
  • users spelling mistakes
    • correcting / Correction of users' spelling mistakes
    • correcting, Suggesters used / Suggesters
    • correcting, _suggest REST endpoint used / Using the _suggest REST endpoint
    • correcting, term suggester used / Term suggester
    • correcting, phrase suggester used / The phrase suggester
    • correcting, completion suggester used / The completion suggester

V

  • VirtualLock
    • URL / Mlockall property
  • Visual GC plugin
    • URL / VisualVM
  • VisualVM
    • about / VisualVM
    • URL / VisualVM
  • VM parameter
    • -Xms / Memory configuration
    • -Xmx / Memory configuration
    • -Xmn / Memory configuration
    • -XXTopicnPermSize / Memory configuration
    • -XXTopicnMaxPermSize / Memory configuration
    • -XXTopicnInitialSurvivorRatio / Memory configuration

W

  • Whitespace Analyzer
    • about / Built-in analyzers
  • Whitespace Tokenizer
    • about / Tokenizer

Y

  • Young Generation
    • about / The structure of JVM memory