Book Image

Elasticsearch Essentials

Book Image

Elasticsearch Essentials

Overview of this book

With constantly evolving and growing datasets, organizations have the need to find actionable insights for their business. ElasticSearch, which is the world's most advanced search and analytics engine, brings the ability to make massive amounts of data usable in a matter of milliseconds. It not only gives you the power to build blazing fast search solutions over a massive amount of data, but can also serve as a NoSQL data store. This guide will take you on a tour to become a competent developer quickly with a solid knowledge level and understanding of the ElasticSearch core concepts. Starting from the beginning, this book will cover these core concepts, setting up ElasticSearch and various plugins, working with analyzers, and creating mappings. This book provides complete coverage of working with ElasticSearch using Python and performing CRUD operations and aggregation-based analytics, handling document relationships in the NoSQL world, working with geospatial data, and taking data backups. Finally, we’ll show you how to set up and scale ElasticSearch clusters in production environments as well as providing some best practices.
Table of Contents (18 chapters)
Elasticsearch Essentials
Credits
About the Author
Acknowledgments
About the Reviewer
www.PacktPub.com
Preface
Index

Index

A

  • aggregation
    • syntax / Aggregation syntax
    • values, extracting / Extracting values
  • aggregation framework
    • about / Introducing the aggregation framework
  • aggregation results
    • returning / Returning only aggregation results
  • aggregations
    • metrics / Introducing the aggregation framework
    • buckets / Introducing the aggregation framework
    • combining / Combining search, buckets, and metrics
  • aggregation types
    • value_count / Computing stats separately
    • min / Computing stats separately
    • max / Computing stats separately
    • avg / Computing stats separately
    • sum / Computing stats separately
  • arrays
    • about / Arrays
  • attributes, core data types
    • index / Configuring data types
    • store / Configuring data types
    • boost / Configuring data types
    • null_value / Configuring data types
  • attributes, string-based fields
    • term_vector / String
    • omit_norms / String
    • analyzer / String
    • index_analyzer / String
    • search_analyzer / String
    • ignore_above / String
  • avg aggregation / Computing stats separately

B

  • backup
    • performing, snapshots API used / Backup using snapshot API
  • backup mechanism
    • implementing / Introducing backup and restore mechanisms
  • basic HTTP authentication
    • setting up / Setting up basic HTTP authentication
  • basic operations, with Elasticsearch
    • about / Basic operations with Elasticsearch
    • index, creating / Creating an Index
    • document, indexing / Indexing a document in Elasticsearch
    • document, fetching / Fetching documents
    • document, updating / Updating documents
    • document, deleting / Deleting documents
    • document existence, checking / Checking documents' existence
  • basic parameters
    • configuring / Configuring basic parameters
  • basic queries
    • about / Understanding Query-DSL parameters
  • Boolean model
    • about / The Elasticsearch out-of-the-box tools
  • Boolean type field
    • about / Boolean
  • bool query
    • about / Bool queries
    • must / Bool queries
    • should / Bool queries
    • must_not / Bool queries
    • filter / Bool queries
    • boost / Bool queries
    • minimum_should_match / Bool queries
    • disable_coord / Bool queries
  • bounding boxes
    • about / Understanding bounding boxes
    • using, with geo distance aggregation / Using bounding boxes with geo distance aggregation
  • bucket aggregations
    • about / Bucket aggregations
  • buckets
    • about / Introducing the aggregation framework
  • bulk create
    • about / Bulk create
    • Python example / Bulk create
    • Java example / Bulk create
  • bulk delete
    • about / Bulk deleting
    • Python example / Bulk deleting
    • Java example / Bulk deleting
  • bulk index
    • about / Bulk indexing
  • bulk processing
    • practical considerations / Practical considerations for bulk processing
  • BulkProcessor
    • parameters / Scrolling and re-indexing documents using scan-scroll
  • bulk update
    • about / Bulk updating
    • Python example / Bulk updating

C

  • Centos
    • Elasticsearch, installing on / Installing Elasticsearch on Centos through the RPM package
  • character filters
    • about / Document analysis
  • cheaper bulk operations
    • about / Cheaper bulk operations
    • bulk create / Bulk create
    • bulk index / Bulk indexing
    • bulk update / Bulk updating
    • bulk delete / Bulk deleting
  • circle
    • about / Circles
  • client node
    • about / Client node
  • cloud
    • references / Backup using snapshot API
  • cluster
    • about / Elasticsearch common terms
    • node, adding in / Adding another node to the cluster
    • creating / Creating a cluster
    • scaling / Scaling your clusters, When to scale, How to scale
    • snapshot, restoring to / Restoring to a different cluster
  • common terms, Elasticsearch
    • node / Elasticsearch common terms
    • cluster / Elasticsearch common terms
    • document / Elasticsearch common terms
    • index / Elasticsearch common terms
    • doc type / Elasticsearch common terms
    • shard / Elasticsearch common terms
    • replica / Elasticsearch common terms
  • complete document
    • fetching / Get a complete document
  • compound queries
    • about / Understanding Query-DSL parameters, Compound queries
    • bool query / Bool queries
    • not query / Not queries
  • core data types
    • attributes / Configuring data types
  • critical access
    • securing / Securing critical access
  • CRUD operations, with elasticsearch-py
    • about / CRUD operations using elasticsearch-py, Performing CRUD operations
    • request timeouts / Request timeouts
    • global timeout / Request timeouts
    • per-request timeout / Request timeouts
    • indexes, creating with settings / Creating indexes with settings and mappings
    • indexes, creating with mappings / Creating indexes with settings and mappings
    • documents, indexing / Indexing documents
    • documents, retrieving / Retrieving documents
    • documents, updating / Updating documents
    • value of field, replacing / Replacing the value of a field completely
    • value, appending in array / Appending a value in an array
    • updates, with docs / Updates using doc
    • document existence, checking / Checking document existence
    • documents, deleting / Deleting a document
  • CRUD operations, with Java
    • about / CRUD operations using Java
    • Elasticsearch connection / Connecting with Elasticsearch
    • document, indexing / Indexing a document
    • document, fetching / Fetching a document
    • document, updating / Updating a document
    • document, updating with doc / Updating a document using doc
    • document, updating with script / Updating a document using script
    • document, deleting / Deleting documents
  • custom analyzers
    • creating / Creating custom analyzers
    • working / Putting custom analyzers into action
  • custom scoring
    • relevancy, controlling with / Controlling relevancy with custom scoring

D

  • data
    • sorting / Sorting your data
  • data node
    • about / Data node
  • data pagination
    • about / Data pagination
    • with scoring / Pagination with scoring
    • without scoring / Pagination without scoring
  • data types
    • about / Data types and index analysis options
    • configuring / Configuring data types
    • core types / Configuring data types
    • complex data types / Configuring data types
  • data types, Elasticsearch
    • geo_point / Introducing geo-spatial data
  • Date data type
    • about / Date
  • date histogram aggregation
    • about / Date histogram aggregation
    • building / Date histogram aggregation
  • date histogram aggregation response
    • building / Date histogram aggregation
  • date range aggregation
    • about / Date range aggregation
    • building / Date range aggregation
  • date range aggregation response
    • parsing / Date range aggregation
  • debian package
    • Elasticsearch, installing on Ubuntu / Installing Elasticsearch on Ubuntu through Debian package
  • decay functions, function_score query
    • linear / Decay functions - linear, exp, and gauss
    • exp / Decay functions - linear, exp, and gauss
    • gauss / Decay functions - linear, exp, and gauss
  • default analyzer
    • changing / Changing a default analyzer
  • DELETE requests
    • restricting / Restricting DELETE requests
  • Divergence from Randomness (DFR)
    • about / The Elasticsearch out-of-the-box tools
  • doc type
    • about / Elasticsearch common terms
  • document
    • about / Elasticsearch common terms
    • indexing, in Elasticsearch / Indexing a document in Elasticsearch
    • fetching / Fetching documents
    • complete document, fetching / Get a complete document
    • part of document, fetching / Getting part of a document
    • updating / Updating documents
    • whole document, updating / Updating a whole document
    • updating, partially / Updating documents partially
    • deleting / Deleting documents
    • existence, checking / Checking documents' existence
  • document analysis
    • about / Document analysis
  • document metadata fields
    • about / Document metadata fields
    • _id / Document metadata fields
    • _source / Document metadata fields
    • _all / Document metadata fields
    • _ttl / Document metadata fields
    • dynamic / Document metadata fields
  • document relationships
    • considerations / Considerations for using document relationships
  • document routing
    • about / Document routing
  • documents
    • sorting, by field values / Sorting documents by field values
    • scrolling, scan-scroll used / Scrolling and re-indexing documents using scan-scroll
    • re-indexing, scan-scroll used / Scrolling and re-indexing documents using scan-scroll
  • doc_values
    • advantages / Memory pressure and implications

E

  • ElasticSearch
    • node types / Node types in Elasticsearch
    • best practices, in production / Best Elasticsearch practices in production
  • Elasticsearch
    • about / Introducing Elasticsearch
    • features / The primary features of Elasticsearch
    • installing / Installing and configuring Elasticsearch
    • configuring / Installing and configuring Elasticsearch
    • installing, on Ubuntu / Installing Elasticsearch on Ubuntu through Debian package
    • installing, on Centos / Installing Elasticsearch on Centos through the RPM package
    • installation directory layout / Understanding the Elasticsearch installation directory layout
    • Head plugin, installing for / Installing the Head plugin for Elasticsearch
    • Sense, installing for / Installing Sense for Elasticsearch
    • document, indexing in / Indexing a document in Elasticsearch
    • relational data, managing in / Managing relational data in Elasticsearch
    • search types / Introducing search types in Elasticsearch
    • out-of-the-box tools / The Elasticsearch out-of-the-box tools
    • securing / Securing Elasticsearch
  • elasticsearch-py
    • reference link / CRUD operations using elasticsearch-py
    • installing / Installing elasticsearch-py
  • Elasticsearch mapping
    • about / Elasticsearch mapping
  • Elasticsearch plugins
    • installing / Installing Elasticsearch plugins
    • site plugins / Installing Elasticsearch plugins
    • Java plugins / Installing Elasticsearch plugins
    • installed plugins, checking / Checking for installed plugins
  • Elasticsearch queries
    • basic queries / Understanding Query-DSL parameters
    • compound queries / Understanding Query-DSL parameters
  • Elasticsearch structure
    • relational databases / Understanding Elasticsearch structure with respect to relational databases
  • Elasticsearch version
    • upgrading / Upgrading Elasticsearch version
  • endpoints
    • restricting / Restricting endpoints
  • envelope
    • about / Envelops
  • exact term search
    • about / Text search
  • exists queries
    • about / Exists queries

F

  • features, Elasticsearch
    • distributed / The primary features of Elasticsearch
    • High Availability / The primary features of Elasticsearch
    • REST-based / The primary features of Elasticsearch
    • powerful Query DSL / The primary features of Elasticsearch
    • schemaless / The primary features of Elasticsearch
  • field
    • indexing / Indexing the same field in different ways
  • fields
    • sorting / Sorting on more than one field
  • field values
    • documents, sorting by / Sorting documents by field values
  • filter-based aggregation response
    • parsing / Filter-based aggregation
  • filter based aggregation
    • about / Filter-based aggregation
    • building / Filter-based aggregation
  • filters
    • about / Missing queries
  • Full-Text Search Queries
    • about / Query types, Full-text search queries
    • match_all / match_all
    • match_query / match query
    • multi match / multi match
    • query_string / query_string
  • full text search
    • about / Text search
    • examples / Text search
  • function_score query
    • about / The function_score query
    • parameters / The function_score query
    • weight function / weight
    • field_value_factor / field_value_factor
    • script_score / script_score
    • decay functions / Decay functions - linear, exp, and gauss

G

  • geo-aggregations
    • about / Geo-aggregations
  • geo-point data
    • working with / Working with geo-point data
    • indexing / Indexing geo-point data
    • querying / Querying geo-point data
    • sorting, by distance / Sorting by distance
  • geo-point fields
    • mapping / Mapping geo-point fields
  • geo-shape data
    • indexing / Indexing geo-shape data
    • querying / Querying geo-shape data
  • geo-shape fields
    • mapping / Mappings geo-shape fields
  • geo-shapes
    • about / Geo-shapes
    • point / Point
    • linestring / Linestring
    • circle / Circles
    • polygon / Polygons
    • envelope / Envelops
  • geo-spatial data
    • about / Introducing geo-spatial data
  • geo bounding box query
    • about / Geo bounding box query
  • geo distance aggregation
    • about / Geo distance aggregation
    • bounding boxes, using with / Using bounding boxes with geo distance aggregation
  • geo distance query
    • about / Geo distance query
  • geo distance range query
    • about / Geo distance range query
  • Geohashes
    • versus Quadtree / Mappings geo-shape fields

H

  • Hadoop plugins
    • references / Backup using snapshot API
  • has_child query
    • about / has_child query
  • has_parent query
    • about / has_parent query
  • HEAD plugin
    • reference link / Installing the Head plugin for Elasticsearch
  • Head plugin
    • installing, for Elasticsearch / Installing the Head plugin for Elasticsearch
  • histogram aggregation
    • about / Histogram aggregation
    • building / Histogram aggregation
  • histogram aggregation response
    • parsing / Histogram aggregation

I

  • IDF (term)
    • about / TF-IDF
  • implications
    • about / Memory pressure and implications
  • index
    • about / Elasticsearch common terms
    • creating / Creating an Index
    • mappings, inserting in / Putting mappings in an index
    • building, with sample documents / An example: why defaults are not enough
  • index analysis options
    • about / Data types and index analysis options
  • index settings
    • modifying, during restore / Changing index settings during restore
  • indices
    • renaming / Renaming indices
  • Information Based (IB)
    • about / The Elasticsearch out-of-the-box tools
  • installation directory layout, Elasticsearch / Understanding the Elasticsearch installation directory layout
  • installing
    • Elasticsearch, on Ubuntu / Installing Elasticsearch on Ubuntu through Debian package
    • Elasticsearch, on Centos / Installing Elasticsearch on Centos through the RPM package
    • Elasticsearch plugins / Installing Elasticsearch plugins
    • Head plugin, for Elasticsearch / Installing the Head plugin for Elasticsearch
    • Sense, for Elasticsearch / Installing Sense for Elasticsearch
    • Pip / Installing Pip
    • virtualenv / Installing virtualenv
    • elasticsearch-py / Installing elasticsearch-py
  • inverted indexes
    • about / Inverted indexes

J

  • Java plugins / Installing Elasticsearch plugins
  • JSON
    • about / What is JSON?
  • JSON object
    • example / What is JSON?
  • JVM and OS dependencies, of Elasticsearch
    • reference link / Understanding Elasticsearch structure with respect to relational databases

K

  • keyword analyzer
    • about / Introducing Lucene analyzers

L

  • language analyzer
    • about / Introducing Lucene analyzers
    • reference link / Introducing Lucene analyzers
  • language plugin
    • reference link / script_score
  • linestring
    • about / Linestring
  • load balancing, Nginx
    • about / Load balancing using Nginx
    • reference link / Load balancing using Nginx
  • Lucene
    • about / The primary features of Elasticsearch
  • Lucene analyzers
    • about / Introducing Lucene analyzers
    • standard analyzer / Introducing Lucene analyzers
    • simple analyzer / Introducing Lucene analyzers
    • whitespace analyzer / Introducing Lucene analyzers
    • keyword analyzer / Introducing Lucene analyzers
    • language analyzer / Introducing Lucene analyzers

M

  • manual backups
    • about / Manual backups
  • manual restoration
    • about / Manual restoration
  • mappings
    • inserting, in index / Putting mappings in an index
    • viewing / Viewing mappings
    • updating / Updating mappings
  • master node
    • about / Master node
  • match query
    • about / match query
    • phrase search option / Phrase search
  • match_all query / match_all
  • max aggregation / Computing stats separately
  • memory pressure
    • about / Memory pressure and implications
  • metric aggregations
    • about / Metric aggregations
    • single-value metric / Metric aggregations
    • multi-value metric / Metric aggregations
    • basic stats, computing / Computing basic stats
    • combined stats, computing / Combined stats
    • stats, computing separately / Computing stats separately
    • extended stats, computing / Computing extended stats
    • distinct counts, finding / Finding distinct counts
  • metrics
    • about / Introducing the aggregation framework
  • Metrics to Watch
    • about / Metrics to watch
    • CPU utilization / CPU utilization
    • memory utilization / Memory utilization
    • disk I/O utilization / Disk I/O utilization
    • disk low utilization / Disk low watermark
  • min aggregation / Computing stats separately
  • missing queries
    • about / Missing queries
  • multi buckets
    • about / Bucket aggregations
  • multicasting discovery / Multicasting discovery
  • multi get
    • about / Multi get and multi search APIs, Multi get
    • Python example / Multi get
    • Java example / Multi get
  • multilevel aggregation response
    • parsing / Combining search, buckets, and metrics
  • multi match query
    • about / multi match
  • multiple indices
    • restoring / Restoring multiple indices
  • multi search APIs
    • about / Multi get and multi search APIs, Multi searches
    • Python example / Multi searches
    • Java example / Multi searches
  • multivalued fields
    • sorting / Sorting multivalued fields

N

  • nested aggregations
    • about / Nested aggregations
    • syntax / Understanding nested aggregation syntax:
  • nested data
    • indexing / Indexing nested data
  • nested field
    • querying / Querying nested type data
  • nested mappings
    • creating / Creating nested mappings
  • nested objects
    • working with / Working with nested objects
  • NFS drive
    • about / Backup using snapshot API
    • creating / Creating an NFS drive
    • client machines, configuring / Configuring client machines
  • NFS Exports
    • about / Configuring the NFS host server
  • NFS host server
    • configuring / Configuring the NFS host server
  • Nginx
    • setting up / Setting up Nginx
  • node
    • about / Elasticsearch common terms
    • adding, in cluster / Adding another node to the cluster
  • node types, ElasticSearch
    • about / Node types in Elasticsearch
    • client node / Client node
    • data node / Data node
    • master node / Master node
  • node upgradations
    • without downtime / Node upgrades without downtime
  • not query
    • about / Not queries
  • number data types
    • about / Number

O

  • objects
    • about / Objects
  • OpenStreetMap
    • URL / Understanding bounding boxes

P

  • parameters, functions_score query
    • boost / The function_score query
    • max_boost / The function_score query
    • boost_mode / The function_score query
    • score_mode / The function_score query
    • min_score / The function_score query
  • parameters, Query-DSL
    • query / Understanding Query-DSL parameters
    • from / Understanding Query-DSL parameters
    • size / Understanding Query-DSL parameters
    • _source / Understanding Query-DSL parameters
  • parent-child documents
    • indexing / Indexing parent-child documents
    • querying / Querying parent-child documents
    • has_child query / has_child query
    • has_parent query / has_parent query
  • parent-child mappings
    • creating / Creating parent-child mappings
  • parent-child relationships
    • about / Parent-child relationships
  • partial restore
    • about / Partial restore
  • Pip
    • installing / Installing Pip
  • point
    • about / Point
  • polygon
    • about / Polygons
  • practical considerations, for bulk processing
    • about / Practical considerations for bulk processing
    • multisearch / Practical considerations for bulk processing
    • scan-scroll / Practical considerations for bulk processing
    • bulk indexing / Practical considerations for bulk processing
    • bulk updates / Practical considerations for bulk processing
  • Python environments
    • setting up / Setting up the environment

Q

  • Quadtree
    • versus Geohash / Mappings geo-shape fields
  • queries
    • about / Missing queries
  • Query-DSL
    • about / Elasticsearch Query-DSL
    • syntax / Elasticsearch Query-DSL
    • parameters / Understanding Query-DSL parameters
  • Query DSL / The primary features of Elasticsearch
  • query_string query
    • about / query_string

R

  • range aggregation
    • about / Range aggregation
    • building / Range aggregation
  • range aggregation response
    • parsing / Range aggregation
  • range query
    • about / Range queries
  • relational data
    • managing, in Elasticsearch / Managing relational data in Elasticsearch
  • relational data, in document-oriented NoSQL world
    • about / Relational data in the document-oriented NoSQL world
  • relevancy / Introducing relevant searches
    • controlling, with custom scoring / Controlling relevancy with custom scoring
  • relevant search
    • about / Introducing relevant searches
  • replica
    • about / Elasticsearch common terms
  • REST
    • about / What is REST?
  • restore
    • index settings, modifying during / Changing index settings during restore
  • restore mechanism
    • implementing / Introducing backup and restore mechanisms
  • reverse nested aggregation
    • about / Reverse nested aggregation
  • RPM package
    • Elasticsearch, installing on Centos / Installing Elasticsearch on Centos through the RPM package

S

  • scan-scroll
    • used, for scrolling documents / Scrolling and re-indexing documents using scan-scroll
    • used, for re-indexing documents / Scrolling and re-indexing documents using scan-scroll
  • scripting
    • reference link / Updating a document using script
  • search
    • about / Text search
  • search database
    • creating / Creating a search database
  • search requests
    • reference link / Parsing search responses
  • search requests, with Java
    • about / Search requests using Java
    • search responses, parsing / Parsing search responses
  • search requests, with Python
    • about / Search requests using Python
  • search types, Elasticsearch
    • query_then_fetch / Introducing search types in Elasticsearch
    • dfs_query_then_fetch / Introducing search types in Elasticsearch
    • scan / Introducing search types in Elasticsearch
  • Sense
    • installing, for Elasticsearch / Installing Sense for Elasticsearch
  • shard
    • about / Elasticsearch common terms
  • simple analyzer
    • about / Introducing Lucene analyzers
  • single buckets
    • about / Bucket aggregations
  • site plugins / Installing Elasticsearch plugins
  • snapshot
    • information, obtaining of / Getting snapshot information
    • deleting / Deleting snapshots
    • restoring / Restoring snapshots
    • restoring, to cluster / Restoring to a different cluster
  • snapshot, creating
    • about / Creating a snapshot, Create your first snapshot
    • repository path, registering / Registering the repository path
    • shared file system repository, registering in Elasticsearch / Registering the shared file system repository in Elasticsearch
  • snapshots API
    • used, for performing backup / Backup using snapshot API
  • split-brain
    • about / Minimum number of master nodes: preventing split-brain
    • avoiding / Minimum number of master nodes: preventing split-brain
  • standard analyzer
    • about / Introducing Lucene analyzers
  • string
    • about / String
  • string-based fields
    • attributes / String
  • string fields
    • sorting / Sorting on string fields
  • sum aggregation / Computing stats separately

T

  • Term-Based Search Queries
    • about / Term-based search queries
    • term query / Term query
    • terms query / Terms query
    • range query / Range queries
    • exists query / Exists queries
    • missing queries / Missing queries
  • TERM-BASED SEARCH QUERIES / Query types
  • term query
    • about / Term query
  • terms aggregation / Terms aggregation
  • terms query
    • about / Terms query
  • term vectors
    • reference link / String
  • text search
    • about / Text search
    • exact term search / Text search
    • full text search / Text search
  • TF (term)
    • about / TF-IDF
  • TF-IDF
    • about / TF-IDF
  • token filters
    • about / Document analysis
    • reference link / Document analysis
  • tokenizers
    • about / Document analysis
    • reference link / Document analysis
  • Twitter API access token keys
    • reference link / Creating a search database

U

  • Ubuntu
    • Elasticsearch, installing on / Installing Elasticsearch on Ubuntu through Debian package
  • unicasting discovery
    • about / Unicasting discovery
    • configuring / Configuring unicasting discovery

V

  • Vector Space model
    • about / The Elasticsearch out-of-the-box tools
  • Vector Space Model (VSM)
    • about / TF-IDF
  • virtualenv
    • installing / Installing virtualenv

W

  • whitespace analyzer
    • about / Introducing Lucene analyzers
  • whole document
    • updating / Updating a whole document

Z

  • Zen-Discovery
    • about / Introducing Zen-Discovery
    • multicasting discovery / Multicasting discovery
    • unicasting discovery / Unicasting discovery
  • Zookeeper
    • about / Introducing Zen-Discovery