Mastering ElasticSearch

Mastering ElasticSearch

By : Rafał Kuć, Marek Rogoziński

Buy this Book

Mastering ElasticSearch

By: Rafał Kuć, Marek Rogoziński

Buy this Book

Overview of this book

ElasticSearch is fast, distributed, scalable, and written in the Java search engine that leverages Apache Lucene capabilities providing a new level of control over how you index and search even the largest set of data. "Mastering ElasticSearch" covers the intermediate and advanced functionalities of ElasticSearch and will let you understand not only how ElasticSearch works, but will also guide you through its internals such as caches, Apache Lucene library, monitoring capabilities, and the Java API. In addition to that you'll see the practical usage of ElasticSearch configuration parameters, monitoring API, and easy-to-use and extend examples on how to extend ElasticSearch by writing your own plugins. "Mastering ElasticSearch" starts by showing you how Apache Lucene works and what the ElasticSearch architecture looks like. It covers advanced querying capabilities, index configuration control, index distribution, ElasticSearch administration and troubleshooting. Finally you'll see how to improve the user’s search experience, use the provided Java API and develop your own custom plugins. It will help you learn how Apache Lucene works both in terms of querying and indexing. You'll also learn how to use different scoring models, rescoring documents using other queries, alter how the index is written by using custom postings and what segments merging is, and how to configure it to your needs. You'll optimize your queries by modifying them to use filters and you'll see why it is important. The book describes in details how to use the shard allocation mechanism present in ElasticSearch such as forced awareness. "Mastering ElasticSearch" will open your eyes to the practical use of the statistics and information API available for the index, node and cluster level, so you are not surprised about what your ElasticSearch does while you are not looking. You'll also see how to troubleshoot by understanding how the Java garbage collector works, how to control I/O throttling, and see what threads are being executed at the any given moment. If user spelling mistakes are making you lose sleep at night - don't worry anymore the book will show you how to configure and use the ElasticSearch spell checker and improve the query relevance of your queries. Last, but not least you'll see how to use the ElasticSearch Java API to use the ElasticSearch cluster from your JVM based application and you'll extend ElasticSearch by writing your own custom plugins. If you are looking for a book that will allow you to easily extend your basic knowledge about ElasticSearch or you want to go deeper into the world of full text search using ElasticSearch then this book is for you.

Mastering ElasticSearch

Credits

About the Authors

About the Reviewers

www.PacktPub.com

Preface

Free Chapter

Introduction to ElasticSearch

Introducing Apache Lucene

Introducing ElasticSearch

Summary

Power User Query DSL

Default Apache Lucene scoring explained

Query rewrite explained

Rescore

Bulk Operations

Sorting data

Update API

Using filters to optimize your queries

Filter and scopes in ElasticSearch faceting mechanism

Summary

Low-level Index Control

Altering Apache Lucene scoring

Similarity model configuration

Using codecs

NRT, flush, refresh, and transaction log

Looking deeper into data handling

Segment merging under control

Summary

Index Distribution Architecture

Choosing the right amount of shards and replicas

Routing explained

Altering the default shard allocation behavior

Adjusting shard allocation

Query execution preference

Using our knowledge

Summary

ElasticSearch Administration

Choosing the right directory implementation – the store module

Discovery configuration

Segments statistics

Understanding ElasticSearch caching

Summary

Fighting with Fire

Knowing the garbage collector

When it is too much for I/O – throttling explained

Speeding up queries using warmers

Very hot threads

Real-life scenarios

Summary

Improving the User Search Experience

Correcting user spelling mistakes

Improving query relevance

Summary

ElasticSearch Java APIs

Introducing the ElasticSearch Java API

The code

Connecting to your cluster

Anatomy of the API

CRUD operations

Querying ElasticSearch

Performing multiple actions

Percolator

The explain API

Building JSON queries and documents

The administration API

Summary

Developing ElasticSearch Plugins

Creating the Apache Maven project structure

Creating a custom river plugin

Creating custom analysis plugin

Summary

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

About the Authors

Rafał Kuć is a born team leader and a Software Developer. Working as a Consultant and a Software Engineer at Sematext Group, Inc., he concentrates on open source technologies such as Apache Lucene, Solr, ElasticSearch, and Hadoop stack. He has more than 11 years of experience in various software branches—from banking software to e-commerce products. He is mainly focused on Java, but open to every tool and programming language that will make the achievement of his goal easier and faster. He is also one of the founders of the solr.pl site, where he tries to share his knowledge and help people to resolve their problems with Solr and Lucene. He is also a speaker for various conferences around the world such as Lucene Eurocon, Berlin Buzzwords, ApacheCon, and Lucene Revolution.

Rafał began his journey with Lucene in 2002 and it wasn't love at first sight. When he came back to Lucene in late 2003, he revised his thoughts about the framework and saw the potential in search technologies. Then Solr came and this was it. He started working with ElasticSearch in the middle of 2010. Currently, Lucene, Solr, ElasticSearch, and information retrieval are his main points of interest.

Rafał is also an author of Solr 3.1 Cookbook, the update to it—Solr 4.0 Cookbook, and is a co-author of ElasticSearch Server all published by Packt Publishing.

The book you are holding in your hands was something that I wanted to write after finishing the ElasticSearch Server book and I got the opportunity. I wanted not to jump from topic to topic, but concentrate on a few of them and write about what I know and share the knowledge. Again, just like the ElasticSearch Server book, I couldn't include all topics I wanted, and some small details that are more or less important, depending on the use case, had to be left aside. Nevertheless, I hope that by reading this book you'll be able to easily get into all the details about ElasticSearch and underlying Apache Lucene, and I also hope that it will let you get the desired knowledge easier and faster.

I would like to thank my family for their support and patience during all those days and evenings when I was sitting in front  of a screen instead of being fully with them.

I would also like to thank all the people I'm working with at Sematext, especially Otis, who took his time and convinced  me that Sematext is the right company for me.

Finally, I would like to thank all the people involved in creating, developing, and maintaining ElasticSearch and Lucene projects  for their work and passion. Without them this book wouldn't be written and open source search would have been less powerful.

Once again, thank you.

Marek Rogoziński is a Software Architect and a Consultant with more than 10 years of experience. His specialization involves solutions based on open source search engines such as Solr and ElasticSearch and software stack for big data analytics including Hadoop, Hbase, and Twitter Storm.

He is also a co-founder of the solr.pl site which publishes information and tutorials about Solr and Lucene library and is the co-author of the ElasticSearch Server book published by Packt Publishing.

He currently holds a position of Chief Technology Officer in a company building products based on the processing and analysis of large streams of input data.

Just like the previous book, writing Mastering ElasticSearch was a difficult task. To tell the truth, it was much harder not only because of more advanced topics covered in this book, but also because of the constantly introduced changes in the ElasticSearch codebase. The development of it is not going to slow down and literally speaking, every day brings something new. Please remember that this book should be treated as a continuation of the previous book. This means, we have tried to omit all the topics that we had covered before, and we wanted to add everything that was omitted. You can see if you have succeeded yourself. Now it's time to thank everyone.

Mastering ElasticSearch

By : Rafał Kuć, Marek Rogoziński

Mastering ElasticSearch

By: Rafał Kuć, Marek Rogoziński

Overview of this book

Related Content you might be interested in

Current Title:

Mastering ElasticSearch

About the Authors