Elasticsearch Indexing

By : Huseyin Akdogan

Elasticsearch Indexing

By: Huseyin Akdogan

Overview of this book

Beginning with an overview of the way ElasticSearch stores data, you’ll begin to extend your knowledge to tackle indexing and mapping, and learn how to configure ElasticSearch to meet your users’ needs. You’ll then find out how to use analysis and analyzers for greater intelligence in how you organize and pull up search results – to guarantee that every search query is met with the relevant results! You’ll explore the anatomy of an ElasticSearch cluster, and learn how to set up configurations that give you optimum availability as well as scalability. Once you’ve learned how these elements work, you’ll find real-world solutions to help you improve indexing performance, as well as tips and guidance on safety so you can back up and restore data. Once you’ve learned each component outlined throughout, you will be confident that you can help to deliver an improved search experience – exactly what modern users demand and expect.

Elasticsearch Indexing

Credits

About the Author

About the Reviewer

www.PacktPub.com

Preface

Free Chapter

Introduction to Efficient Indexing

Getting started

Understanding the document storage strategy

Analysis

Summary

What is an Elasticsearch Index

Nature of the Elasticsearch index

Document

Summary

Basic Concepts of Mapping

Basic concepts and definitions

Types

The relationship between mapping and relevant search results

Understanding the schema-less

Summary

Analysis and Analyzers

Introducing analysis

Process of analysis

Built-in analyzers

What's text normalization?

ICU analysis plugin

An Analyzer Pipeline

Specifying the analyzer for a field in the mapping

Summary

Anatomy of an Elasticsearch Cluster

Basic concepts

Node

Shards

Replicas

Explaining the architecture of distribution

Correctly configuring the cluster

Choosing the right amount of shards and replicas

Summary

Improving Indexing Performance

Configuration

Optimization of mapping definition

Segments and merging policies

Store module

Bulk API

Notes

Summary

Snapshot and Restore

Snapshot repository

Snapshot

Restore

How does the snapshot process works?

Summary

Improving the User Search Experience

Correction of users' spelling mistakes

Get suggestions

Improving the relevancy of search results

Summary

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Index

A

analysis
- about / Analysis, Introducing analysis
- examining / Analysis
- process / Process of analysis
- tokenizing / Process of analysis
- normalizing / Process of analysis
analyzer
- specifying, for field in mapping / Specifying the analyzer for a field in the mapping
- custom analyzer, creating / Creating a custom analyzer
analyzer pipeline
- about / An Analyzer Pipeline
Apache Lucene
- about / Understanding the document storage strategy, Indices
- URL / Understanding the document storage strategy
ASCII Folding Token Filter
- about / Token filters
ASCII Folding token filter
- about / ASCII Foldng Token filter
attachment type
- about / Attachment type
- reference link / Attachment type
AWS Cloud Plugin
- about / Cloud repository
- URL / Cloud repository
Azure Cloud Plugin
- URL / Cloud repository
- about / Cloud repository

B

bool query
- using / Bool query
built-in analyzers
- about / Built-in analyzers
- Standard Analyzer / Built-in analyzers
- Simple Analyzer / Built-in analyzers
- Whitespace Analyzer / Built-in analyzers
- Stop Analyzer / Built-in analyzers
- Pattern Analyzer / Built-in analyzers
- Language Analyzer / Built-in analyzers
- building blocks / Building blocks of Analyzer
- character filters / Characte filters
- tokenizer / Tokenizer
- token filters / Token filters
bulk API
- about / Bulk API
bulk sizing
- about / Bulk sizing

C

character filters
- about / Characte filters
- HTML Strip Char Filter / HTML Strip Char filter
- Pattern Replace Char Filter / Pattern Replace Char filter
client nodes
- about / Client nodes
cloud repository
- about / Cloud repository
completion suggester
- used, for correcting users spelling mistakes / The completion suggester
- configuration, mapping / Mapping the configuration for the completion suggester
- completion field, indexing / Indexing on completion field
Concurrent Mark Sweep garbage collector
- about / Concurrent Mark Sweep garbage collector
configuration, for high performance indexing
- performing / Configuration
- memory configuration / Memory configuration
- swapping, avoiding / Avoiding swapping
- garbage collector / Garbage collector
- JVM memory / The structure of JVM memory
- file descriptors / File descriptors
custom analyzer
- creating / Creating a custom analyzer

D

database
- about / Understanding the document storage strategy
dedicated master nodes
- about / Dedicated master nodes
denormalization
- about / Denormalization
document
- about / Document
- inverted index / Inverted index
document-oriented search engine
- about / Understanding the document storage strategy
document storage
- about / Understanding the document storage strategy
- _source field / The _source field
- storable field, versus searchable field / The difference between the storable and searchable field

E

Elasticsearch cluster
- about / Basic concepts
- architecture, of distribution / Explaining the architecture of distribution
- configuring / Correctly configuring the cluster
ES_HEAP_SIZE environment variable
- about / The ES_HEAP_SIZE environment variable

F

file descriptors
- about / File descriptors
- FD limit, increasing on Unix systems / Increasing FD limit on Unix systems
Finite State Transducer (FST) data structure
- about / The completion suggester
- URL / The completion suggester

G

G1 garbage collector
- about / G1 garbage collector
garbage collection
- monitoring / Monitoring garbage collection
- tuning / Tuning the garbage collection
garbage collector
- about / Garbage collector
- strategies / Different strategies among garbage collectors
- serial garbage collector / Serial garbage collector
- parallel garbage collector / Parallel garbage collector
- Concurrent Mark Sweep garbage collector / Concurrent Mark Sweep garbage collector
- G1 garbage collector / G1 garbage collector

H

HDFS filesystem repository
- about / HDFS filesystem repository
- URL / HDFS filesystem repository
HTML Strip Char Filter
- about / HTML Strip Char filter
hybrid filesystem store
- about / Hybrid filesystem store

I

I/O operations
- throttling / Throttling I/O operations
- throttling type, configuring / Throttling type
ICU Analysis
- about / ICU analysis plugin
- reference link / ICU analysis plugin
- ASCII Folding token filter / ASCII Foldng Token filter
indices
- about / Indices
- mapping / Mapping
- types / Types
inverted index
- about / Understanding the document storage strategy, Indices, Inverted index

J

Java FileChannel Class
- URL / New IO filesystem store
Java garbage collection
- about / Garbage collector
Java RandomAccessFile Class
- URL / Simple filesystem store
JavaScript Object Notation (JSON)
- about / Indices
- reference link / Indices
JConsole
- URL / Monitoring garbage collection
jstat command
- URL / Monitoring garbage collection
JVM memory
- structure / The structure of JVM memory
- Eden Space / The structure of JVM memory
- Survivor Space / The structure of JVM memory
- Tenured Generation / The structure of JVM memory
- Permanent Generation / The structure of JVM memory
- Code Cache / The structure of JVM memory
- problem / What is the problem?
- garbage collection, monitoring / Monitoring garbage collection
- VisualVM / VisualVM
- garbage collectors, strategies / Different strategies among garbage collectors
- deallocating / Process of deallocating memory
- garbage collector / Types of garbage collector

L

Language Analyzer
- about / Built-in analyzers
Length Token Filter
- about / Token filters
Letter Tokenizer
- about / Tokenizer
log_byte_size policy
- about / log_byte_size policy
- settings / log_byte_size policy
log_doc policy
- about / Log_doc policy
- settings / Log_doc policy
Lowercase Token Filter
- about / Token filters
Lucene MMapDirectory
- URL / MMap filesystem store
Lucene NIOFSDirectory
- URL / New IO filesystem store
Lucene SimpleFSDirectory
- URL / Simple filesystem store

M

major GC
- about / The structure of JVM memory
mapping
- about / Mapping, Basic concepts and definitions
- metadata fields / Metadata fields
- and search results, relationship between / The relationship between mapping and relevant search results
- analyzer, specifying for field / Specifying the analyzer for a field in the mapping
mapping definition
- optimization / Optimization of mapping definition
- norms / Norms
- index_option of string type / Feature index_option of string type
- unnecessary fields, excluding / Exclude unnecessary fields
- automatic index refresh time, setting / Extension of the automatic index refresh time
memory configuration
- about / Memory configuration
- ES_HEAP_SIZE environment variable / The ES_HEAP_SIZE environment variable
merging policies
- about / Segments and merging policies
- selecting / Choosing the right merge policy
- tiered policy / Tiered policy
- log_byte_size policy / log_byte_size policy
- log_doc policy / Log_doc policy
metadata fields
- about / Metadata fields
- _source / _source
- _all / _all
- _timestamp / _timestamp
- _ttl / _ttl
minor GC
- about / The structure of JVM memory
mlockall property
- about / Mlockall property
MMap filesystem store
- about / MMap filesystem store

N

n-gram language models
- URL / The phrase suggester
new IO filesystem store
- about / New IO filesystem store
NFC
- about / What's text normalization?
NFD
- about / What's text normalization?
NFKC
- about / What's text normalization?
NFKD
- about / What's text normalization?
node
- about / Node
- non-data nodes / Non-data nodes
- tribe node / Tribe node
non-data nodes
- dedicated master nodes / Dedicated master nodes
- client nodes / Client nodes
Normalization Token Filters
- about / Token filters
normalizing
- about / Process of analysis

O

object type
- about / Object type
- root object type / Root object type
Old Generation
- about / The structure of JVM memory
optimize API
- about / The optimize API

P

parallel garbage collector
- about / Parallel garbage collector
Path Hierarchy Tokenizer
- about / Tokenizer
Pattern Analyzer
- about / Built-in analyzers
Pattern Replace Char Filter
- about / Pattern Replace Char filter
Pattern Tokenizer
- about / Tokenizer
phrase suggester
- used, for correcting users spelling mistakes / The phrase suggester
- configuring / Configuring the phrase suggester

R

relevancy, of search results
- improving / Improving the relevancy of search results
- query, boosting / Boosting the query
- bool query, using / Bool query
- synonyms, using / Synonyms
- _all field, using / Be careful about the _all field
replicas
- about / Replicas
- selecting / Choosing the right amount of shards and replicas
restore
- about / Restore
- index settings, overriding / Overriding index settings during restore
Reverse Token Filter
- about / Token filters
root object type
- about / Root object type

S

schema-less
- about / Understanding the schema-less
search results
- and mapping, relationship between / The relationship between mapping and relevant search results
- relevancy, improving / Improving the relevancy of search results
segments
- about / Segments and merging policies
- optimize API / The optimize API
serial garbage collector
- about / Serial garbage collector
sharding
- about / Indices, Shards
shards
- about / Indices, Shards
- selecting / Choosing the right amount of shards and replicas
shared filesystem repository
- about / Shared filesystem repository
Simple Analyzer
- about / Built-in analyzers
simple filesystem store
- about / Simple filesystem store
snapshot
- about / Snapshot
- process / How does the snapshot process works?
snapshot repository
- about / Snapshot repository
- types / Repository types
- shared filesystem repository / Shared filesystem repository
- URL repository / URL repository
- cloud repository / Cloud repository
- HDFS filesystem repository / HDFS filesystem repository
Standard Analyzer
- about / Built-in analyzers
standard analyzer
- about / Analysis
Standard Tokenizer
- about / Tokenizer
Stop Analyzer
- about / Built-in analyzers
Stop Token Filter
- about / Token filters
storable field
- versus searchable field / The difference between the storable and searchable field
store module
- about / Store module
- store types / Store types
store types
- about / Store types
- simple filesystem store / Simple filesystem store
- new IO filesystem store / New IO filesystem store
- MMap filesystem store / MMap filesystem store
- hybrid filesystem store / Hybrid filesystem store
Suggest API
- about / Correction of users' spelling mistakes
Suggesters
- used, for correcting users spelling mistakes / Suggesters
suggestions
- obtaining / Get suggestions
swapping
- avoiding / Avoiding swapping
- mlockall property / Mlockall property
synonyms
- using / Synonyms

T

term suggester
- used, for correcting users spelling mistakes / Term suggester
- configuring / Configuring the term suggester
- configuration options / Common suggest options
- additional options / Other and additional term suggester options
text normalization
- about / What's text normalization?
tiered policy
- about / Tiered policy
- settings / Tiered policy
token filters
- about / Token filters
- ASCII Folding Token Filter / Token filters
- Length Token Filter / Token filters
- Lowercase Token Filter / Token filters
- Uppercase Token Filter / Token filters
- Stop Token Filter / Token filters
- Reverse Token Filter / Token filters
- Trim Token Filter / Token filters
- Normalization Token Filters / Token filters
tokenizer
- about / Tokenizer
- Standard Tokenizer / Tokenizer
- Letter Tokenizer / Tokenizer
- Whitespace Tokenizer / Tokenizer
- Pattern Tokenizer / Tokenizer
- UAX Email URL Tokenizer / Tokenizer
- Path Hierarchy Tokenizer / Tokenizer
tokenizing
- about / Process of analysis
tribe node
- about / Tribe node
Trim Token Filter
- about / Token filters
types
- about / Types
- object type / Object type
- attachment type / Attachment type
types, indices
- about / Types

U

UAX Email URL Tokenizer
- about / Tokenizer
Unicode Consortium
- about / Analysis
- URL / Analysis
unicode normalization forms
- URL / What's text normalization?
Unicode Standard Annex #29
- URL / Tokenizer
Unix systems
- FD limit, increasing on / Increasing FD limit on Unix systems
Uppercase Token Filter
- about / Token filters
URL repository
- about / URL repository
users spelling mistakes
- correcting / Correction of users' spelling mistakes
- correcting, Suggesters used / Suggesters
- correcting, _suggest REST endpoint used / Using the _suggest REST endpoint
- correcting, term suggester used / Term suggester
- correcting, phrase suggester used / The phrase suggester
- correcting, completion suggester used / The completion suggester

V

VirtualLock
- URL / Mlockall property
Visual GC plugin
- URL / VisualVM
VisualVM
- about / VisualVM
- URL / VisualVM
VM parameter
- -Xms / Memory configuration
- -Xmx / Memory configuration
- -Xmn / Memory configuration
- -XXTopicnPermSize / Memory configuration
- -XXTopicnMaxPermSize / Memory configuration
- -XXTopicnInitialSurvivorRatio / Memory configuration

W

Whitespace Analyzer
- about / Built-in analyzers
Whitespace Tokenizer
- about / Tokenizer

Y

Young Generation
- about / The structure of JVM memory

Elasticsearch Indexing

By : Huseyin Akdogan

Elasticsearch Indexing

By: Huseyin Akdogan

Overview of this book

Related Content you might be interested in

Current Title:

Elasticsearch Indexing

Index

A

B

C

D

E

F

G

H

I

J

L

M

N

O

P

R

S

T

U

V

W

Y