Book Image

Elasticsearch Essentials

Book Image

Elasticsearch Essentials

Overview of this book

With constantly evolving and growing datasets, organizations have the need to find actionable insights for their business. ElasticSearch, which is the world's most advanced search and analytics engine, brings the ability to make massive amounts of data usable in a matter of milliseconds. It not only gives you the power to build blazing fast search solutions over a massive amount of data, but can also serve as a NoSQL data store. This guide will take you on a tour to become a competent developer quickly with a solid knowledge level and understanding of the ElasticSearch core concepts. Starting from the beginning, this book will cover these core concepts, setting up ElasticSearch and various plugins, working with analyzers, and creating mappings. This book provides complete coverage of working with ElasticSearch using Python and performing CRUD operations and aggregation-based analytics, handling document relationships in the NoSQL world, working with geospatial data, and taking data backups. Finally, we’ll show you how to set up and scale ElasticSearch clusters in production environments as well as providing some best practices.
Table of Contents (18 chapters)
Elasticsearch Essentials
Credits
About the Author
Acknowledgments
About the Reviewer
www.PacktPub.com
Preface
Index

Memory pressure and implications


Aggregations are awesome! However, they bring a lot of memory pressure on Elasticsearch. They work on an in-memory data structure called fielddata, which is the biggest consumer of HEAP memory in a Elasticsearch cluster. Fielddata is not only used for aggregations, but also used for sorting and scripts. The in-memory fielddata is slow to load, as it has to read the whole inverted index and un-invert it. If the fielddata cache fills up, old data is evicted causing heap churn and bad performance (as fielddata is reloaded and evicted again.)

The more unique terms exist in the index, the more terms will be loaded into memory and the more pressure it will have. If you are using an Elasticsearch version below 2.0.0 and above 1.0.0, then you can use the doc_vlaues parameter inside the mapping while creating the index to avoid the use of fielddata using the following syntax:

PUT /index_name/_mapping/index_type
{
  "properties": {
    "field_name": {
      "type": ...