Book Image

Elasticsearch 8.x Cookbook - Fifth Edition

By : Alberto Paro
Book Image

Elasticsearch 8.x Cookbook - Fifth Edition

By: Alberto Paro

Overview of this book

Elasticsearch is a Lucene-based distributed search engine at the heart of the Elastic Stack that allows you to index and search unstructured content with petabytes of data. With this updated fifth edition, you'll cover comprehensive recipes relating to what's new in Elasticsearch 8.x and see how to create and run complex queries and analytics. The recipes will guide you through performing index mapping, aggregation, working with queries, and scripting using Elasticsearch. You'll focus on numerous solutions and quick techniques for performing both common and uncommon tasks such as deploying Elasticsearch nodes, using the ingest module, working with X-Pack, and creating different visualizations. As you advance, you'll learn how to manage various clusters, restore data, and install Kibana to monitor a cluster and extend it using a variety of plugins. Furthermore, you'll understand how to integrate your Java, Scala, Python, and big data applications such as Apache Spark and Pig with Elasticsearch and create efficient data applications powered by enhanced functionalities and custom plugins. By the end of this Elasticsearch cookbook, you'll have gained in-depth knowledge of implementing the Elasticsearch architecture and be able to manage, search, and store data efficiently and effectively using Elasticsearch.
Table of Contents (20 chapters)

Specifying different analyzers

In the previous recipes, we learned how to map different fields and objects in Elasticsearch, and we described how easy it is to change the standard analyzer with the analyzer and search_analyzer properties.

In this recipe, we will look at several analyzers and learn how to use them to improve indexing and searching quality.

Getting ready

You will need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe of Chapter 1Getting Started.

How to do it…

Every core type field allows you to specify a custom analyzer for indexing and for searching as field parameters.

For example, if we want the name field to use a standard analyzer for indexing and a simple analyzer for searching, the mapping will be as follows:

{ "name": {
    "type": "string",
    "index_analyzer": "standard",
    "search_analyzer": "simple"
  } }

How it works…

The concept of the analyzer comes from Lucene (the core of Elasticsearch). An analyzer is a Lucene element that is composed of a tokenizer that splits text into tokens, as well as one or more token filters. These filters carry out token manipulation such as lowercasing, normalization, removing stop words, stemming, and so on.

During the indexing phase, when Elasticsearch processes a field that must be indexed, an analyzer is chosen. First, it checks whether it is defined in the index_analyzer field, then in the document, and finally, in the index.

Choosing the correct analyzer is essential to getting good results during the query phase.

Elasticsearch provides several analyzers in its standard installation. The following table shows the most common ones:

Figure 2.4 – List of the most common general-purpose analyzers

Figure 2.4 – List of the most common general-purpose analyzers

For special language purposes, Elasticsearch supports a set of analyzers aimed at analyzing text in a specific language, such as Arabic, Armenian, Basque, Brazilian, Bulgarian, Catalan, Chinese, CJK, Czech, Danish, Dutch, English, Finnish, French, Galician, German, Greek, Hindi, Hungarian, Indonesian, Italian, Norwegian, Persian, Portuguese, Romanian, Russian, Spanish, Swedish, Turkish, and Thai.

See also

Several Elasticsearch plugins extend the list of available analyzers. The most famous ones are as follows: