Mapping base types
Using explicit mapping makes it possible to start to quickly ingest the data using a schemaless approach without being concerned about field types. Thus, to achieve better results and performance in indexing, it's required to manually define a mapping.
Fine-tuning mapping brings some advantages, such as the following:
- Reducing the index size on the disk (disabling functionalities for custom fields)
- Indexing only interesting fields (general speed up)
- Precooking data for fast search or real-time analytics (such as aggregations)
- Correctly defining whether a field must be analyzed in multiple tokens or considered as a single token
- Defining mapping types such as geo point, suggester, vectors, and so on
Elasticsearch allows you to use base fields with a wide range of configurations.
Getting ready
You will need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe of Chapter 1, Getting Started.
To execute the commands in this recipe, you can use any HTTP client, such as curl (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or similar. I suggest using the Kibana console, which provides code completion and better character escaping for Elasticsearch.
To execute this recipe's examples, you will need to create an index with a test
name, where you can put mappings, as explained in the Using explicit mapping creation recipe.
How to do it...
Let's use a semi real-world example of a shop order for our eBay-like shop:
- First, we must define an order:
- Our
order
record must be converted into an Elasticsearch mapping definition, as follows:PUT test/_mapping { "properties" : { "id" : {"type" : "keyword"}, "date" : {"type" : "date"}, "customer_id" : {"type" : "keyword"}, "sent" : {"type" : "boolean"}, "name" : {"type" : "keyword"}, "quantity" : {"type" : "integer"}, "price" : {"type" : "double"}, "vat" : {"type" : "double", "index": false} } }
Now, the mapping is ready to be put in the index. We will learn how to do this in the Putting a mapping in an index recipe of Chapter 3, Basic Operations.
How it works...
Field types must be mapped to one of the Elasticsearch base types, and options on how the field must be indexed need to be added.
The following table is a reference for the mapping types:
Depending on the data type, it's possible to give explicit directives to Elasticsearch when you're processing the field for better management. The most used options are as follows:
store
(defaultfalse
): This marks the field to be stored in a separate index fragment for fast retrieval. Storing a field consumes disk space but reduces computation if you need to extract it from a document (that is, in scripting and aggregations). The possible values for this option aretrue
andfalse
. They are always retuned as an array of values for consistency.
The stored fields are faster than others in aggregations.
index
: This defines whether or not the field should be indexed. The possible values for this parameter aretrue
andfalse
. Index fields are not searchable (the default istrue
).null_value
: This defines a default value if the field is null.boost
: This is used to change the importance of a field (the default is1.0
).
boost
works on a term level only, so it's mainly used in term, terms, and match queries.
search_analyzer
: This defines an analyzer to be used during the search. If it's not defined, the analyzer of the parent object is used (the default isnull
).analyzer
: This sets the default analyzer to be used (the default isnull
).norms
: This controls the Lucene norms. This parameter is used to score queries better. If the field is only used for filtering, it's a best practice to disable it to reduce resource usage (true
for analyzed fields andfalse
fornot_analyzed
ones).copy_to
: This allows you to copy the content of a field to another one to achieve functionalities, similar to the_all
field.ignore_above
: This allows you to skip the indexing string if it's bigger than its value. This is useful for processing fields for exact filtering, aggregations, and sorting. It also prevents a single term token from becoming too big and prevents errors due to the Lucene term's byte-length limit of 32,766. The maximum suggested value is8191
(https://www.elastic.co/guide/en/elasticsearch/reference/current/ignore-above.html).
There's more...
From Elasticsearch version 6.x onward, as shown in the Using explicit mapping creation recipe, the explicit inferred type for a string is a multifield mapping:
- The default processing is
text
. This mapping allows textual queries (that is, term, match, and span queries). In the example provided in the Using explicit mapping creation recipe, this wasname
. - The
keyword
subfield is used forkeyword
mapping. This field can be used for exact term matching and aggregation and sorting. In the example provided in the Using explicit mapping creation recipe, the referred field wasname.keyword
.
Another important parameter, available only for text
mapping, is term_vector
(the vector of terms that compose a string). Please refer to the Lucene documentation for further details at https://lucene.apache.org/core/8_7_0/core/org/apache/lucene/index/Terms.html.
term_vector
can accept the following values:
no
: This is the default value; that is, skip term vector.yes
: This is the store term vector.with_offsets
: This is the store term vector with a token offset (start, end position in a block of characters).with_positions
: This is used to store the position of the token in the term vector.with_positions_offsets
: This stores all the term vector data.with_positions_payloads
: This is used to store the position and payloads of the token in the term vector.with_positions_offsets_payloads
: This stores all the term vector data with payloads.
Term vectors allow fast highlighting but consume disk space due to storing additional text information. It's a best practice to only activate it in fields that require highlighting, such as title or document content.
See also
You can refer to the following sources for further details on the concepts of this chapter:
- The online documentation on Elasticsearch provides a full description of all the properties for the different mapping fields at https://www.elastic.co/guide/en/elasticsearch/reference/master/mapping-params.html.
- The Specifying a different analyzer recipe at the end of this chapter shows alternative analyzers to the standard one.
- For newcomers who want to explore the concepts of tokenization, I would suggest reading the official Elasticsearch documentation at https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-tokenizers.html.