The Elasticsearch ecosystem does have some very important components and some of the important ones, especially for us are as detailed in this section.
Elasticsearch stores data in a very systematic and easily accessible and searchable fashion. To make data analysis easy and data more searchable, when the data is inducted into Elasticsearch, the following steps are done:
- Initial tidying of the string received (sanitizing). This is done by a character filter in Elasticsearch. This filter can sanitize the string before actual tokenization. It can also take out unnecessary characters or can even transform certain characters as needed.
- Tokenize the string into terms for creating an Inverted Index. This is done by Tokenizers in Elasticsearch. Various types of tokenizers exist that can do the job of actually splitting the string to terms/tokens.
- Normalize the data and search terms to make the search easier and relevant (further filtering and sanitizing...