Book Image

ElasticSearch Server

Book Image

ElasticSearch Server

Overview of this book

ElasticSearch is an open source search server built on Apache Lucene. It was built to provide a scalable search solution with built-in support for near real-time search and multi-tenancy.Jumping into the world of ElasticSearch by setting up your own custom cluster, this book will show you how to create a fast, scalable, and flexible search solution. By learning the ins-and-outs of data indexing and analysis, "ElasticSearch Server" will start you on your journey to mastering the powerful capabilities of ElasticSearch. With practical chapters covering how to search data, extend your search, and go deep into cluster administration and search analysis, this book is perfect for those new and experienced with search servers.In "ElasticSearch Server" you will learn how to revolutionize your website or application with faster, more accurate, and flexible search functionality. Starting with chapters on setting up your own ElasticSearch cluster and searching and extending your search parameters you will quickly be able to create a fast, scalable, and completely custom search solution.Building on your knowledge further you will learn about ElasticSearch's query API and become confident using powerful filtering and faceting capabilities. You will develop practical knowledge on how to make use of ElasticSearch's near real-time capabilities and support for multi-tenancy.Your journey then concludes with chapters that help you monitor and tune your ElasticSearch cluster as well as advanced topics such as shard allocation, gateway configuration, and the discovery module.
Table of Contents (17 chapters)
ElasticSearch Server
Credits
About the Authors
Acknowledgement
Acknowledgement
About the Reviewers
www.PacktPub.com
Preface
Index

Data manipulation with REST API


ElasticSearch REST API can be used for various tasks. Thanks to it, we can manage indexes, change instance parameters, check nodes and cluster status, index data, and search it. But for now, we will concentrate on using the CRUD (create-retrieve-update-delete ) part of the API, which allows us to use ElasticSearch in a similar way to how you would use a NoSQL database.

What is REST?

Before moving on to a description of various operations, a few words about REST itself. In a REST-like architecture, every request is directed to a concrete object indicated by the path part of the address. For example, if /books/ is a reference to a list of books in our library, /books/1 is a reference to the book with the identifier 1. Note that these objects can be nested. /books/1/chapter/6 is the sixth chapter in the first book in the library, and so on. We have the subject of our API call. What about an operation that we would like to execute, such as GET or POST? To indicate that, request types are used. An HTTP protocol gives us quite a long list of request types to use as verbs in the API calls. Logical choices are GET in order to obtain the current state of the requested object, POST for changing the object state, PUT for object creation, and DELETE for destroying an object. There is also a HEAD request that is only used for fetching the base information about an object.

If we look at the examples of the operations discussed in the Shutting down ElasticSearch section, everything should make more sense:

  • GET http://localhost:9000/: Retrieves information about an instance as a whole

  • GET http://localhost:9200/_cluster/nodes/: Retrieves information about the nodes in an ElasticSearch cluster

  • POST http://localhost:9200/_cluster/nodes/_shutdown: Sends information to shut down an object in the nodes in a cluster of ElasticSearch

Now we will check how these operations can be used to store, fetch, alter, and delete data from ElasticSearch.

Storing data in ElasticSearch

In ElasticSearch, every piece of data has a defined index and type. You can think about an index as a collection of documents or a table in a database. In contrast to database records, documents added to an index have no defined structure and field types. More precisely, a single field has its type defined, but ElasticSearch can do some magic and guess the corresponding type.

Creating a new document

Now we will try to index some documents. For our example, let's imagine that we are building some kind of CMS for our blog. One of the entities in this blog is (surprise!) articles. Using the JSON notation, a document can be presented as shown in the following example:

{
  "id": "1",
  "title": "New version of Elastic Search released!",
  "content": "…",
  "priority": 10,
  "tags": ["announce", "elasticsearch", "release"]
}

As we can see, the JSON document contains a set of fields, where each field can have a different form. In our example, we have a number (priority), text (title), and an array of strings (tags). In the next examples, we will show you the other types. As mentioned earlier in this chapter, ElasticSearch can guess these type (because JSON is semi-typed; that is, the numbers are not in quotation marks) and automatically customize the way of storing this data in its internal structures.

Now we want to store this record in the index and make it available for searching. Choosing the index name as blog and type as article, we can do this by executing the following command:

curl -XPUT http://localhost:9200/blog/article/1 -d '{"title": "New version of Elastic Search released!", "content": "...", "tags": ["announce", "elasticsearch", "release"] }'

You can notice a new option to cURL, -d. The parameter value of this option is the text that should be used as a request payload—a request body. This way we can send additional information such as a document definition.

Note that the unique identifier is placed in the URL, not in the body. If you omit this identifier, the search returns an error, similar to the following:

No handler found for uri [/blog/article/] and method [PUT] 

If everything is correct, the server will answer with a JSON response similar to this:

{
  "ok":true,
  "_index":"blog",
  "_type":"article",
  "_id":"1",
  "_version":1
}

In the preceding reply, ElasticSearch includes information about the status of the operation and shows where the new document was placed. There is information about the document's unique identifier and current version, which will be incremented automatically by ElasticSearch every time the document changes.

In the above example, we've specified the document identifier ourselves. But ElasticSearch can generate this automatically. This seems very handy, but only when an index is the only source of data. If we use a database for storing data and ElasticSearch for full text searching, synchronization of this data will be hindered unless the generated identifier is stored in the database as well. Generation of a unique key can be achieved by using the following command:

curl -XPOST http://localhost:9200/blog/article/ -d '{"title": "New version of Elastic Search released!", "content": "...", "tags": ["announce", "elasticsearch", "release"] }'

Notice POST instead of PUT. Referring to the previous description of the REST verbs, we wanted to change the list of documents in an index rather than create a new entity, and that's why we used POST instead of PUT. The server should respond with a response similar to the following:

{
  "ok" : true,
  "_index" : "blog",
  "_type" : "article",
  "_id" : "XQmdeSe_RVamFgRHMqcZQg",
  "_version" : 1
}

Note the highlighted line, which has an automatically generated unique identifier.

Retrieving documents

We already have documents stored in our instance. Now let's try to retrieve them:

curl -XGET http://localhost:9200/blog/article/1

Then the server returns the following response:

{
  "_index" : "blog",
  "_type" : "article",
  "_id" : "1",
  "_version" : 1,
  "exists" : true, 
  "_source" : {
  "title": "New version of Elastic Search released!", 
  "content": "...", 
  "tags": ["announce", "elasticsearch", "release"] 
}

In the response, besides index, type, identifier, and version, we also see the information saying that the document was found and the source of this document. If the document is not found, we get a reply as follows:

{
  "_index" : "blog",
  "_type" : "article",
  "_id" : "9999",
  "exists" : false
}

Of course, there is no information about the version and source.

Updating documents

Updating documents in an index is a more complicated task. Internally, ElasticSearch must fetch the document, take its data from the _source field, remove the old document, apply changes, and index it as a new document. ElasticSearch implements this through a script given as a parameter. This allows us to do more sophisticated document transformation than simple field changes. Let's see how it works in a simple case.

After executing the following command:

curl -XPOST http://localhost:9200/blog/article/1/_update -d '{
  "script": "ctx._source.content = \"new content\""
}'

The server replies with the following:

{"ok":true,"_index":"blog","_type":"article","_id":"1","_version":2}

It works! To be sure, let's retrieve the current document:

curl -XGET http://localhost:9200/blog/article/1

{
  "_index" : "blog",
  "_type" : "article",
  "_id" : "1",
  "_version" : 2,
  "exists" : true, 
  "_source" : {
  "title":"New version of Elastic Search released!",
  "content":"new content",
  "tags":["announce","elasticsearch","release"]}
}

The server changed the contents of our article and the version number for this document. Notice that we didn't have to send the whole document, only the changed parts. But remember that to use the update functionality, we need to use the _source field—we will describe how to use the _source field in the Extending your index structure with additional internal information section in Chapter 3, Extending Your Structure and Search.

There is one more thing about document updates—if your script uses a field value from a document that is to be updated, you can set a value that will be used if the document doesn't have that value present. For example, if you would like to increment the counter field of the document and it is not present, you can use the upsert section in your request to provide the default value that is going to be used. For example:

curl -XPOST http://localhost:9200/blog/article/1/_update -d '{
  "script": "ctx._source.counter += 1",
  "upsert": {
    "counter" : 0
  }
}'

In the preceding example, if the document we are updating doesn't have a value in the counter field, the value of 0 will be used.

Deleting documents

We have already seen how to create (PUT) and retrieve (GET) documents. A document can be removed in the similar way but the only difference is in the verb used. Let's execute the following delete command:

curl -XDELETE http://localhost:9200/blog/article/1
{"ok":true,"found":true,"_index":"blog","_type":"article","_id":"1","_version":3}

Now we are able to use the CRUD operations. This lets us create applications using ElasticSearch as a simple key-value store. But this is only the beginning!