6. The Index Distribution Architecture | Mastering Elasticsearch 5.x

Sign In Start Free Trial

Book Overview & Buying
Table Of Contents

Mastering Elasticsearch 5.x - Third Edition

By : Bharvi Dixit

1 (1)

Mastering Elasticsearch 5.x

1 (1)

By: Bharvi Dixit

Overview of this book

Elasticsearch is a modern, fast, distributed, scalable, fault tolerant, and open source search and analytics engine. Elasticsearch leverages the capabilities of Apache Lucene, and provides a new level of control over how you can index and search even huge sets of data. This book will give you a brief recap of the basics and also introduce you to the new features of Elasticsearch 5. We will guide you through the intermediate and advanced functionalities of Elasticsearch, such as querying, indexing, searching, and modifying data. We’ll also explore advanced concepts, including aggregation, index control, sharding, replication, and clustering. We’ll show you the modules of monitoring and administration available in Elasticsearch, and will also cover backup and recovery. You will get an understanding of how you can scale your Elasticsearch cluster to contextualize it and improve its performance. We’ll also show you how you can create your own analysis plugin in Elasticsearch. By the end of the book, you will have all the knowledge necessary to master Elasticsearch and put it to efficient use.

Preface

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Free Chapter

1. Revisiting Elasticsearch and the Changes

1. Revisiting Elasticsearch and the Changes

An overview of Lucene

Introducing Elasticsearch 5.x

Summary

2. The Improved Query DSL

2. The Improved Query DSL

The changed default text scoring in Lucene - BM25

Re-factored Query DSL

Choosing the right query for the job

Query rewrite explained

Query templates

Summary

3. Beyond Full Text Search

3. Beyond Full Text Search

Controlling multimatching

Multimatch types

Controlling scores using the function score query

Built-in functions under the function score query

Query rescoring

Structure of the rescore query

Elasticsearch scripting

Painless - the new default scripting language

Lucene expressions

Summary

4. Data Modeling and Analytics

4. Data Modeling and Analytics

Data modeling techniques in Elasticsearch

Managing relational data in Elasticsearch

Data analytics using aggregations

Summary

5. Improving the User Search Experience

5. Improving the User Search Experience

Correcting user spelling mistakes

Suggesters

Implementing your own auto-completion

Working with synonyms

Summary

6. The Index Distribution Architecture

6. The Index Distribution Architecture

Configuring an example multi-node cluster

Choosing the right amount of shards and replicas

Routing explained

Shard allocation control

Query execution preference

Stripping data on multiple paths

Index versus type - a revised approach for creating indices

Summary

7. Low-Level Index Control

7. Low-Level Index Control

Altering Apache Lucene scoring

Available similarity models

Setting a per-field similarity

Similarity model configuration

Choosing the default similarity model

Choosing the right directory implementation - the store module

The store type

NRT, flush, refresh, and transaction log

Segment merging under control

Understanding Elasticsearch caching

Summary

8. Elasticsearch Administration

8. Elasticsearch Administration

Node types in Elasticsearch

Discovery and recovery modules

The human-friendly status API - using the cat API

Backing up

Restoring snapshots

Summary

9. Data Transformation and Federated Search

9. Data Transformation and Federated Search

Preprocessing data within Elasticsearch with ingest nodes

Federated search

Summary

10. Improving Performance

10. Improving Performance

Query validation and profiling

Very hot threads

Scaling Elasticsearch

Managing time-based indices efficiently using shrink and rollover APIs

Summary

11. Developing Elasticsearch Plugins

11. Developing Elasticsearch Plugins

Creating the Apache Maven project structure

Creating a custom REST action

Creating the custom analysis plugin

Summary

12. Introducing Elastic Stack 5.0

12. Introducing Elastic Stack 5.0

Overview of Elastic Stack 5.0

Introducing Logstash, Beats, and Kibana

Summary

Query execution preference

Let's forget about the shard placement and how to configure it--at least for a moment. In addition to all the fancy stuff that Elasticsearch allows us to set for shards and replicas, we also have the possibility to specify where our queries (and other operations, for example, the real-time GET) should be executed.

Introducing the preference parameter

In order to control where the query (and other operations) we are sending will be executed, we can use the preference parameter, which can be set to one of the following values:

_primary: Using this property, the operations we are sending will only be executed on primary shards. So, if we send a query against the mastering index with the preference parameter set to the _primary value, we would have it executed on the nodes with the names node1 and node2. For example, if you know that your primary shards are in one rack and the replicas are in other racks, you may want to execute the operation on primary shards to avoid...

CONTINUE READING

83

Tech Concepts

36

Programming languages

73

Tech Tools

Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

50+ new titles added per month and exclusive early access to books as they are being written.

Mastering Elasticsearch 5.x

Search

Your notes and bookmarks