Book Image

Elasticsearch 5.x Cookbook - Third Edition

By : Alberto Paro
Book Image

Elasticsearch 5.x Cookbook - Third Edition

By: Alberto Paro

Overview of this book

Elasticsearch is a Lucene-based distributed search server that allows users to index and search unstructured content with petabytes of data. This book is your one-stop guide to master the complete Elasticsearch ecosystem. We’ll guide you through comprehensive recipes on what’s new in Elasticsearch 5.x, showing you how to create complex queries and analytics, and perform index mapping, aggregation, and scripting. Further on, you will explore the modules of Cluster and Node monitoring and see ways to back up and restore a snapshot of an index. You will understand how to install Kibana to monitor a cluster and also to extend Kibana for plugins. Finally, you will also see how you can integrate your Java, Scala, Python, and Big Data applications such as Apache Spark and Pig with Elasticsearch, and add enhanced functionalities with custom plugins. By the end of this book, you will have an in-depth knowledge of the implementation of the Elasticsearch architecture and will be able to manage data efficiently and effectively with Elasticsearch.
Table of Contents (25 chapters)
Credits
About the Author
About the Reviewer
www.PacktPub.com
Customer Feedback
Dedication
Preface

Using the native protocol


Elasticsearch provides a native protocol, used mainly for low-level communication between nodes, but is very useful for fast importing of huge data blocks. This protocol is available only for JVM languages and is commonly used in Java, Groovy, and Scala.

Getting ready

You need a working Elasticsearch cluster--the standard port for native protocol is 9300.

How to do it...

The steps required to use the native protocol in a Java environment are as follows (in Chapter 14, Java Integration we'll discuss it in detail):

  1. Before starting, we must be sure that Maven loads the Elasticsearch JAR adding to the pom.xml lines:

            <dependency> 
               <groupId>org.elasticsearch</groupId> 
               <artifactId>elasticsearch</artifactId> 
               <version>5.0</version> 
            </dependency> 
    
  2. Depending on Elasticsearch JAR, creating a Java client, it's quite easy:

            import org.elasticsearch.common.settings.Settings; 
            import org.elasticsearch.client.Client; 
            import org.elasticsearch.client.transport.TransportClient; 
            ... 
            Settings settings = Settings.settingsBuilder() 
            .put("client.transport.sniff", true).build(); 
             // we define a new settings 
             // using sniff transport allows to autodetect other nodes 
            Client client = TransportClient.builder()
            .settings(settings).build().addTransportAddress
            (new InetSocketTransportAddress("127.0.0.1", 9300)); 
            // a client is created with the settings 
    

How it works...

To initialize a native client, some settings are required to properly configure it. The important ones are:

  • cluster.name: It is the name of the cluster

  • client.transport.sniff: It allows to sniff the rest of the cluster topology and adds discovered nodes into the client list of machines to use

With these settings, it's possible to initialize a new client giving an IP address and port (default 9300).

There's more...

This is the internal protocol used in Elasticsearch: it's the fastest protocol available to talk with Elasticsearch.

The native protocol is an optimized binary one and works only for JVM languages. To use this protocol, you need to include elasticsearch.jar in your JVM project. Because it depends on Elasticsearch implementation, it must be the same version of the Elasticsearch cluster.

Note

Every time you update Elasticsearch, you need to update the elasticsearch.jar on which it depends, and if there are internal API changes, you need to update your code.

To use this protocol, you also need to study the internals of Elasticsearch, so it's not so easy to use as HTTP protocol.

Native protocol is very useful for massive data import. But as Elasticsearch is mainly thought as a REST HTTP server to communicate with, it lacks support for everything is not standard in Elasticsearch core, such as plugins entry points. Using this protocol, you are unable to call entry points made by external plugins in an easy way.

Note

The native protocol seems easier to integrate in a Java/JVN project, but due to its nature that follows the fast release cycles of Elasticsearch, its API could change often even for minor release upgrades and your code will be broken.

See also

The native protocol is the most used in the Java world and it will be deeply discussed in Chapters 14, Java Integration, Chapters 15, Scala Integration, and Chapter 17, Plugin Development.

For further details on Elasticsearch Java API, they are available on Elasticsearch site at https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/index.html.