Book Image

Advanced Elasticsearch 7.0

By : Wai Tak Wong
Book Image

Advanced Elasticsearch 7.0

By: Wai Tak Wong

Overview of this book

Building enterprise-grade distributed applications and executing systematic search operations call for a strong understanding of Elasticsearch and expertise in using its core APIs and latest features. This book will help you master the advanced functionalities of Elasticsearch and understand how you can develop a sophisticated, real-time search engine confidently. In addition to this, you'll also learn to run machine learning jobs in Elasticsearch to speed up routine tasks. You'll get started by learning to use Elasticsearch features on Hadoop and Spark and make search results faster, thereby improving the speed of query results and enhancing the customer experience. You'll then get up to speed with performing analytics by building a metrics pipeline, defining queries, and using Kibana for intuitive visualizations that help provide decision-makers with better insights. The book will later guide you through using Logstash with examples to collect, parse, and enrich logs before indexing them in Elasticsearch. By the end of this book, you will have comprehensive knowledge of advanced topics such as Apache Spark support, machine learning using Elasticsearch and scikit-learn, and real-time analytics, along with the expertise you need to increase business productivity, perform analytics, and get the very best out of Elasticsearch.
Table of Contents (25 chapters)
Free Chapter
Section 1: Fundamentals and Core APIs
Section 2: Data Modeling, Aggregations Framework, Pipeline, and Data Analytics
Section 3: Programming with the Elasticsearch Client
Section 4: Elastic Stack
Section 5: Advanced Features

What this book covers

Chapter 1, Overview of Elasticsearch 7, takes beginners through some basic features in minutes. We just take a few steps to launch the new version of the Elasticsearch server. An architectural overview and a core concept introduction will make it easy to understand the workflow in Elasticsearch.

Chapter 2, Index APIs, discusses how to use index APIs to manage individual indices, index settings, aliases, and templates. It also involves monitoring statistics for operations that occur on an index. Index management operations including refreshing, flushing, and clearing the cache are also discussed.

Chapter 3, Document APIs, begins with the basic information about a document and its life cycle. Then we learn how to access it. After that, we look at accessing multiple documents with the bulk API. Finally, we discuss migrating indices from the old version to version 7.0.

Chapter 4, Mapping APIs, introduces the schema in Elasticsearch. The mapping rules for both dynamic mappings and explicit static mappings will be discussed. It also provides the idea and details of creating static mapping for an index. We also step into the details of the meta fields and field data types in index mapping.

Chapter 5, Anatomy of an Analyzer, drills down in to the anatomy of the analyzer and in-depth practice different analyzers. We will discuss different character filters, tokenizers, and token filters in order to understand the building blocks of the analyzer. We also practice how to create a custom analyzer and use it in the analyze API.

Chapter 6, Search APIs, covers different types of searches, from terms-based to full-text, from exact search to fuzzy search, from single-field search to multi-search, and then to compound search. Additional information about Query DSL and search-related APIs such as tuning, validating, and troubleshooting will be discussed.

Chapter 7, Modeling Your Data in the Real World, discusses data modeling with Elasticsearch. It focuses on some common issues users may encounter when working with different techniques. It helps you understand some of the conventions and contains insights from real-world examples involving denormalizing complex objects and using nested objects to handle relationships.

Chapter 8, Aggregation Framework, discusses data analytics using the aggregation framework. We learn how to perform aggregations with examples and delve into most of the types of aggregations. We also use IEX ETF historical data to plot a graph for different types of moving averages, including forecasted data supported by the model.

Chapter 9, Preprocessing Documents in Ingest Pipelines, discusses the preprocessing of a document through predefined pipeline processors before the actual indexing operation begins. We also learn about data accessing to documents through the pipeline processor. Finally, we cover exception handling when an error occurs during pipeline processing.

Chapter 10, Using Elasticsearch for Exploratory Data Analysis, uses the aggregation framework to perform data analysis. We first discuss a comprehensive analysis of exploratory data and simple financial analysis of business strategies. In addition, we provide step-by-step instructions for calculating Bollinger Bands using daily operational data. Finally, we will conduct a brief survey of sentiment analysis using Elasticsearch.

Chapter 11, Elasticsearch from Java Programming, focuses on the basics of two supported Java REST clients. We explore the main features and operations of each approach. A sample project is provided to demonstrate the high-level and low-level REST clients integrated with Spring Boot programming.

Chapter 12, Elasticsearch from Python Programming, introduces the Python Elasticsearch client. We learn about two Elasticsearch client packages, elasticsearch-py and elasticsearch-dsl-py. We learn how the clients work and incorporate them into a Python application. We implement Bollinger Bands by using elasticsearch-dsl-py.

Chapter 13, Using Kibana, Logstash, and Beats, outlines the components of the Elastic Stack, including Kibana, Logstash, and Beats. We learn how to use Logstash to collect and parse log data from system log files. In addition, we use Filebeat to extend the use of Logstash to a central log processing center. All work will be run on official supported Elastic Stack Docker images.

Chapter 14, Working with Elasticsearch SQL, introduces Elasticsearch SQL. With Elasticsearch SQL, we can access full-text search using familiar SQL syntax. We can even obtain results in tabular view format. We perform search and aggregation using different approaches, such as using the SQL REST API interface, the command-line interface, and JDBC.

Chapter 15, Working with Elasticsearch Analysis Plugins, introduces built-in Analysis plugins. We practice using the ICU Analysis plugin, the Smart Chinese Analysis plugin, and the IK Analysis plugin to analyze Chinese texts. We also add a new custom dictionary to improve word segmentation to make it generate better results.

Chapter 16, Machine Learning with Elasticsearch, discusses the machine learning feature supported by Elasticsearch. This feature automatically analyzes time series data by running a metric job. This type of job contains one or more detectors (the analyzed fields). We also introduce the Python scikit-learn library and the unsupervised learning algorithm K-means clustering and use it for comparison.

Chapter 17, Spark and Elasticsearch for Real-Time Analytics, focuses on ES-Hadoop's Apache Spark support. We practice reading data from the Elasticsearch index, performing some computations using Spark, and then writing the results back to Elasticsearch through ES-Hadoop. We build a real-time anomaly detection routine based on the K-means model created from past data by using the Spark ML library.

Chapter 18, Building Analytics RESTful Services, explains how to construct a project providing a search analytics REST service powered by Elasticsearch. We combine lots of material and source code from different chapters to build a real-world end-to-end project and present the result on a Kibana Visualize page.