Book Image

Elasticsearch Server - Third Edition

By : Rafal Kuc
Book Image

Elasticsearch Server - Third Edition

By: Rafal Kuc

Overview of this book

ElasticSearch is a very fast and scalable open source search engine, designed with distribution and cloud in mind, complete with all the goodies that Apache Lucene has to offer. ElasticSearch’s schema-free architecture allows developers to index and search unstructured content, making it perfectly suited for both small projects and large big data warehouses, even those with petabytes of unstructured data. This book will guide you through the world of the most commonly used ElasticSearch server functionalities. You’ll start off by getting an understanding of the basics of ElasticSearch and its data indexing functionality. Next, you will see the querying capabilities of ElasticSearch, followed by a through explanation of scoring and search relevance. After this, you will explore the aggregation and data analysis capabilities of ElasticSearch and will learn how cluster administration and scaling can be used to boost your application performance. You’ll find out how to use the friendly REST APIs and how to tune ElasticSearch to make the most of it. By the end of this book, you will have be able to create amazing search solutions as per your project’s specifications.
Table of Contents (18 chapters)
Elasticsearch Server Third Edition
Credits
About the Authors
About the Reviewer
www.PacktPub.com
Preface
Index

Controlling the shard and replica allocation


The indices that live inside your Elasticsearch cluster can be built from many shards and each shard can have many replicas. The ability to divide a single index into multiple shards gives us the possibility of dividing the data into multiple physical instances. The reasons why we want to do this may be different. We may want to parallelize indexing to get more throughput, or we may want to have smaller shards so that our queries are faster. Of course, we may have too many documents to fit them on a single machine and we may want a shard because of this. With replicas, we can parallelize the query load by having multiple physical copies of each shard. We can say that, using shards and replicas, we can scale out Elasticsearch. However, Elasticsearch has to figure out where in the cluster it should place shards and replicas. It needs to figure out on which server/nodes each shard or replica should be placed.

Explicitly controlling allocation

One of...