Learning Elasticsearch

By : Abhishek Andhavarapu

Learning Elasticsearch

By: Abhishek Andhavarapu

Overview of this book

Elasticsearch is a modern, fast, distributed, scalable, fault tolerant, and open source search and analytics engine. You can use Elasticsearch for small or large applications with billions of documents. It is built to scale horizontally and can handle both structured and unstructured data. Packed with easy-to- follow examples, this book will ensure you will have a firm understanding of the basics of Elasticsearch and know how to utilize its capabilities efficiently. You will install and set up Elasticsearch and Kibana, and handle documents using the Distributed Document Store. You will see how to query, search, and index your data, and perform aggregation-based analytics with ease. You will see how to use Kibana to explore and visualize your data. Further on, you will learn to handle document relationships, work with geospatial data, and much more, with this easy-to-follow guide. Finally, you will see how you can set up and scale your Elasticsearch clusters in production environments.

Preface

What this book covers

What you need for this book

Free Chapter

Introduction to Elasticsearch

Basic concepts of Elasticsearch

Interacting with Elasticsearch

How does search work?

Scalability and availability

Summary

Setting Up Elasticsearch and Kibana

Installing Elasticsearch

Installing Kibana

Query format used in this book (Kibana Console)

Using cURL or Postman

Health of the cluster

Summary

Modeling Your Data and Document Relations

Mapping

Difference between full-text search and exact match

Core data types

Complex data types

Geo data type

Specialized data type

Mapping the same field with different mappings

Handling relations between different document types

Routing

Summary

Indexing and Updating Your Data

Indexing your data

Updating your data

Using Kibana to discover

Using Elasticsearch in your application

Concurrency

Translog

Primary and Replica shards

Summary

Organizing Your Data and Bulk Data Ingestion

Summary

All About Search

Different types of queries

Sample data

Querying Elasticsearch

Relevance

Searching for same value across multiple fields

Caching

Summary

More Than a Search Engine (Geofilters, Autocomplete, and More)

Sample data

Correcting typos and spelling mistakes

Making suggestions based on the user input

Highlighting

Handling document relations using parent-child

Handling document relations using nested

Scripting

Post Filter

Reverse search using the percolate query

Geo and Spatial Filtering

Multi search

Search templates

Querying Elasticsearch from Java application

Summary

How to Slice and Dice Your Data Using Aggregations

Aggregation basics

Types of aggregations

Using Kibana to visualize aggregations

Caching

Doc values

Field data

Summary

Production and Beyond

Configuring Elasticsearch

Multinode cluster

How nodes discover each other

X-Pack

Monitoring

Thread pools

Elasticsearch server logs

Summary

Exploring Elastic Stack (Elastic Cloud, Security, Graph, and Alerting)

Elastic Cloud

Security

Graph

Alerting

Summary

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Interacting with Elasticsearch

The primary way of interacting with Elasticsearch is via REST API. Elasticsearch provides JSON-based REST API over HTTP. By default, Elasticsearch REST API runs on port 9200. Anything from creating an index to shutting down a node is a simple REST call. The APIs are broadly classified into the following:

Document APIs: CRUD (Create Retrieve Update Delete) operations on documents
Search APIs: For all the search operations
Indices APIs: For managing indices (creating an index, deleting an index, and so on)
Cat APIs: Instead of JSON, the data is returned in tabular form
Cluster APIs: For managing the cluster

We have a chapter dedicated to each one of them to discuss more in detail. For example, indexing documents in Chapter 4, Indexing and Updating Your Data and search in Chapter 6, All About Search and so on. In this section, we will go through some basic CRUD using the Document APIs. This section is simply a brief introduction on how to manipulate data using Document APIs. To use Elasticsearch in your application, clients in all major languages, such as Java, Python, are also provided. The majority of the clients acts as a wrapper around the REST API.

To better explain the CRUD operations, imagine we are building an e-commerce site. And we want to use Elasticsearch to power its search functionality. We will use an index named chapter1 and store all the products in the type called product. Each product we want to index is represented by a JSON document. We will start by creating a new product document, and then we will retrieve a product by its identifier, followed by updating a product's category and deleting a product using its identifier.

Creating a document

A new document can be added using the Document API's. For the e-commerce example, to add a new product, we execute the following command. The body of the request is the product document we want to index.

PUT http://localhost:9200/chapter1/product/1
{
  "title": "Learning Elasticsearch",
  "author": "Abhishek Andhavarapu",
  "category": "books"
}

Let's inspect the request:

INDEX	chapter1
TYPE	product
IDENTIFIER	1
DOCUMENT	JSON
HTTP METHOD	PUT

The document's properties, such as title, author, the category, are also known as fields, which are similar to SQL columns.

Elasticsearch will automatically create the index chapter1 and type product if they don't exist already. It will create the index with the default settings.

When we execute the preceding request, Elasticsearch responds with a JSON response, shown as follows:

{
   "_index": "chapter1",
   "_type": "product",
   "_id": "1",
   "_version": 1,
   "_shards": {
     "total": 1,
     "successful": 1,
     "failed": 0
   },
   "created": true
 }

In the response, you can see that Elasticsearch created the document and the version of the document is 1. Since you are creating the document using the HTTP PUT method, you are required to specify the document identifier. If you don’t specify the identifier, Elasticsearch will respond with the following error message:

No handler found for uri [/chapter1/product/] and method [PUT]

If you don’t have a unique identifier, you can let Elasticsearch assign an identifier for you, but you should use the POST HTTP method. For example, if you are indexing log messages, you will not have a unique identifier for each log message, and you can let Elasticsearch assign the identifier for you.

In general, we use the HTTP POST method for creating an object. The HTTP PUT method can also be used for object creation, where the client provides the unique identifier instead of the server assigning the identifier.

We can index a document without specifying a unique identifier as shown here:

POST http://localhost:9200/chapter1/product/
{
  "title": "Learning Elasticsearch",
  "author": "Abhishek Andhavarapu",
  "category": "books"
}

In the above request, URL doesn't contain the unique identifier and we are using the HTTP POST method. Let's inspect the request:

INDEX	chapter1
TYPE	product
DOCUMENT	JSON
HTTP METHOD	POST

The response from Elasticsearch is shown as follows:

{
   "_index": "chapter1",
   "_type": "product",
   "_id": "AVmKvtPwWuEuqke_aRsm",
   "_version": 1,
   "_shards": {
     "total": 1,
     "successful": 1,
     "failed": 0
   },
   "created": true
 }

You can see from the response that Elasticsearch assigned the unique identifier AVmKvtPwWuEuqke_aRsm to the document and created flag is set to true. If a document with the same unique identifier already exists, Elasticsearch replaces the existing document and increments the document version. If you have to run the same PUT request from the beginning of the section, the response from Elasticsearch would be this:

{
   "_index": "chapter1",
   "_type": "product",
   "_id": "1",
   "_version": 2,
   "_shards": {
     "total": 1,
     "successful": 1,
     "failed": 0
   },
   "created": false
 }

In the response, you can see that the created flag is false since the document with id: 1 already exists. Also, observe that the version is now 2.

Retrieving an existing document

To retrieve an existing document, we need the index, type and a unique identifier of the document. Let’s try to retrieve the document we just indexed. To retrieve a document we need to use HTTP GET method as shown below:

GET http://localhost:9200/chapter1/product/1

Let’s inspect the request:

INDEX	chapter1
TYPE	product
IDENTIFIER	1
HTTP METHOD	GET

Response from Elasticsearch as shown below contains the product document we indexed in the previous section:

{
   "_index": "chapter1",
   "_type": "product",
   "_id": "1",
   "_version": 2,
   "found": true,
   "_source": {
     "title": "Learning Elasticsearch",
     "author": "Abhishek Andhavarapu",
     "category": "books"
   }
 }

The actual JSON document will be stored in the _source field. Also note the version in the response; every time the document is updated, the version is increased.

Updating an existing document

Updating a document in Elasticsearch is more complicated than in a traditional SQL database. Internally, Elasticsearch retrieves the old document, applies the changes, and re-inserts the document as a new document. The update operation is very expensive. There are different ways of updating a document. We will talk about updating a partial document here and in more detail in the Updating your data section in Chapter 4, Indexing and Updating Your Data.

Updating a partial document

We already indexed the document with the unique identifier 1, and now we need to update the category of the product from just books to technical books. We can update the document as shown here:

 POST http://localhost:9200/chapter1/product/1/_update
 {
   "doc": {
     "category": "technical books"
   }
 }

The body of the request is the field of the document we want to update and the unique identifier is passed in the URL.

Please note the _update endpoint at the end of the URL.

The response from Elasticsearch is shown here:

{
   "_index": "chapter1",
   "_type": "product",
   "_id": "1",
   "_version": 3,
   "_shards": {
     "total": 1,
     "successful": 1,
     "failed": 0
   }
 }

As you can see in the response, the operation is successful, and the version of the document is now 3. More complicated update operations are possible using scripts and upserts.

Deleting an existing document

For creating and retrieving a document, we used the POST and GET methods. For deleting an existing document, we need to use the HTTP DELETE method and pass the unique identifier of the document in the URL as shown here:

DELETE http://localhost:9200/chapter1/product/1

Let's inspect the request:

INDEX	chapter1
TYPE	product
IDENTIFIER	1
HTTP METHOD	DELETE

The response from Elasticsearch is shown here:

{
   "found": true,
   "_index": "chapter1",
   "_type": "product",
   "_id": "1",
   "_version": 4,
   "_shards": {
     "total": 1,
     "successful": 1,
     "failed": 0
   }
 }

In the response, you can see that Elasticsearch was able to find the document with the unique identifier 1 and was successful in deleting the document.

Learning Elasticsearch

By : Abhishek Andhavarapu

Learning Elasticsearch

By: Abhishek Andhavarapu

Overview of this book

Related Content you might be interested in

Current Title:

Learning Elasticsearch

Mastering Elasticsearch 5.x

Elasticsearch 7 Quick Start Guide

Learning Elastic Stack 6.0