Book Image

Learning Elasticsearch

By : Abhishek Andhavarapu
Book Image

Learning Elasticsearch

By: Abhishek Andhavarapu

Overview of this book

Elasticsearch is a modern, fast, distributed, scalable, fault tolerant, and open source search and analytics engine. You can use Elasticsearch for small or large applications with billions of documents. It is built to scale horizontally and can handle both structured and unstructured data. Packed with easy-to- follow examples, this book will ensure you will have a firm understanding of the basics of Elasticsearch and know how to utilize its capabilities efficiently. You will install and set up Elasticsearch and Kibana, and handle documents using the Distributed Document Store. You will see how to query, search, and index your data, and perform aggregation-based analytics with ease. You will see how to use Kibana to explore and visualize your data. Further on, you will learn to handle document relationships, work with geospatial data, and much more, with this easy-to-follow guide. Finally, you will see how you can set up and scale your Elasticsearch clusters in production environments.
Table of Contents (11 chapters)
10
Exploring Elastic Stack (Elastic Cloud, Security, Graph, and Alerting)

Interacting with Elasticsearch

The primary way of interacting with Elasticsearch is via REST API. Elasticsearch provides JSON-based REST API over HTTP. By default, Elasticsearch REST API runs on port 9200. Anything from creating an index to shutting down a node is a simple REST call. The APIs are broadly classified into the following:

  • Document APIs: CRUD (Create Retrieve Update Delete) operations on documents
  • Search APIs: For all the search operations
  • Indices APIs: For managing indices (creating an index, deleting an index, and so on)
  • Cat APIs: Instead of JSON, the data is returned in tabular form
  • Cluster APIs: For managing the cluster

We have a chapter dedicated to each one of them to discuss more in detail. For example, indexing documents in Chapter 4, Indexing and Updating Your Data and search in Chapter 6, All About Search and so on. In this section, we will go through some basic CRUD using the Document APIs. This section is simply a brief introduction on how to manipulate data using Document APIs. To use Elasticsearch in your application, clients in all major languages, such as Java, Python, are also provided. The majority of the clients acts as a wrapper around the REST API.

To better explain the CRUD operations, imagine we are building an e-commerce site. And we want to use Elasticsearch to power its search functionality. We will use an index named chapter1 and store all the products in the type called product. Each product we want to index is represented by a JSON document. We will start by creating a new product document, and then we will retrieve a product by its identifier, followed by updating a product's category and deleting a product using its identifier.

Creating a document

A new document can be added using the Document API's. For the e-commerce example, to add a new product, we execute the following command. The body of the request is the product document we want to index.

PUT http://localhost:9200/chapter1/product/1
{
"title": "Learning Elasticsearch",
"author": "Abhishek Andhavarapu",
"category": "books"
}

Let's inspect the request:

INDEX chapter1
TYPE product
IDENTIFIER 1
DOCUMENT JSON
HTTP METHOD PUT

The document's properties, such as title, author, the category, are also known as fields, which are similar to SQL columns.

Elasticsearch will automatically create the index chapter1 and type product if they don't exist already. It will create the index with the default settings.

When we execute the preceding request, Elasticsearch responds with a JSON response, shown as follows:

{
"_index": "chapter1",
"_type": "product",
"_id": "1",
"_version": 1,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"created": true
}

In the response, you can see that Elasticsearch created the document and the version of the document is 1. Since you are creating the document using the HTTP PUT method, you are required to specify the document identifier. If you don’t specify the identifier, Elasticsearch will respond with the following error message:

No handler found for uri [/chapter1/product/] and method [PUT]

If you don’t have a unique identifier, you can let Elasticsearch assign an identifier for you, but you should use the POST HTTP method. For example, if you are indexing log messages, you will not have a unique identifier for each log message, and you can let Elasticsearch assign the identifier for you.

In general, we use the HTTP POST method for creating an object. The HTTP PUT method can also be used for object creation, where the client provides the unique identifier instead of the server assigning the identifier.

We can index a document without specifying a unique identifier as shown here:

POST http://localhost:9200/chapter1/product/
{
"title": "Learning Elasticsearch",
"author": "Abhishek Andhavarapu",
"category": "books"
}

In the above request, URL doesn't contain the unique identifier and we are using the HTTP POST method. Let's inspect the request:

INDEX chapter1
TYPE product
DOCUMENT JSON
HTTP METHOD POST

The response from Elasticsearch is shown as follows:

{
"_index": "chapter1",
"_type": "product",
"_id": "AVmKvtPwWuEuqke_aRsm",
"_version": 1,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"created": true
}

You can see from the response that Elasticsearch assigned the unique identifier AVmKvtPwWuEuqke_aRsm to the document and created flag is set to true. If a document with the same unique identifier already exists, Elasticsearch replaces the existing document and increments the document version. If you have to run the same PUT request from the beginning of the section, the response from Elasticsearch would be this:

{
"_index": "chapter1",
"_type": "product",
"_id": "1",
"_version": 2,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"created": false
}

In the response, you can see that the created flag is false since the document with id: 1 already exists. Also, observe that the version is now 2.

Retrieving an existing document

To retrieve an existing document, we need the index, type and a unique identifier of the document. Let’s try to retrieve the document we just indexed. To retrieve a document we need to use HTTP GET method as shown below:

GET http://localhost:9200/chapter1/product/1

Let’s inspect the request:

INDEX chapter1
TYPE product
IDENTIFIER 1
HTTP METHOD GET

Response from Elasticsearch as shown below contains the product document we indexed in the previous section:

{
"_index": "chapter1",
"_type": "product",
"_id": "1",
"_version": 2,
"found": true,
"_source": {
"title": "Learning Elasticsearch",
"author": "Abhishek Andhavarapu",
"category": "books"
}
}

The actual JSON document will be stored in the _source field. Also note the version in the response; every time the document is updated, the version is increased.

Updating an existing document

Updating a document in Elasticsearch is more complicated than in a traditional SQL database. Internally, Elasticsearch retrieves the old document, applies the changes, and re-inserts the document as a new document. The update operation is very expensive. There are different ways of updating a document. We will talk about updating a partial document here and in more detail in the Updating your data section in Chapter 4, Indexing and Updating Your Data.

Updating a partial document

We already indexed the document with the unique identifier 1, and now we need to update the category of the product from just books to technical books. We can update the document as shown here:

 POST http://localhost:9200/chapter1/product/1/_update
{
"doc": {
"category": "technical books"
}
}

The body of the request is the field of the document we want to update and the unique identifier is passed in the URL.

Please note the _update endpoint at the end of the URL.

The response from Elasticsearch is shown here:

{
"_index": "chapter1",
"_type": "product",
"_id": "1",
"_version": 3,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
}
}

As you can see in the response, the operation is successful, and the version of the document is now 3. More complicated update operations are possible using scripts and upserts.

Deleting an existing document

For creating and retrieving a document, we used the POST and GET methods. For deleting an existing document, we need to use the HTTP DELETE method and pass the unique identifier of the document in the URL as shown here:

DELETE http://localhost:9200/chapter1/product/1

Let's inspect the request:

INDEX chapter1
TYPE product
IDENTIFIER 1
HTTP METHOD DELETE

The response from Elasticsearch is shown here:

{
"found": true,
"_index": "chapter1",
"_type": "product",
"_id": "1",
"_version": 4,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
}
}

In the response, you can see that Elasticsearch was able to find the document with the unique identifier 1 and was successful in deleting the document.