Here are the write and query configurations:
This defaults to none
; that is, all the data under the Elasticsearch index and type is returned. Specifies the Elasticsearch query that is used when you read data from Elasticsearch, which can be in one of the three forms:
uri
: This specifies the query string parameter, for example,q=category:InformationTechnology
query dsl
: This specifies any Elasticsearch query. For example, consider the following code:{ "query": { "match":["InformationTechnology"] } }
external resource
: This points to a file that contains the uri or the query DSL, for example,/path/to/query.json
Specifies whether the input is already in the json
format or not. The json
should look similar to the following code:
[ { "id": 10178221, "caseNumber": "HY366678", "eventDate": "08/02/15 23:58", "block": "042XX W MADISON ST", "iucr": 1811, "primaryType": "NARCOTICS", "description": "POSS: CANNABIS 30GMS OR LESS", "location": "SIDEWALK", "arrest": "TRUE", "domestic": "FALSE", "lat": 41.88076873, "lon": -87.73136165 }, { .. .. } ]
This defaults to index
.
Specifies how the write
to Elasticsearch if the ID of the incoming document already exists or doesn't exist in the Elasticsearch index. It can take four different values:
index
: This specifies that a new document is added and the old document is updatedcreate
: This indicates that a new document is added and throws an exception if a document with the same ID already existsupdate
: This throws an exception if the document doesn't already exist and updates it otherwiseupsert
: This denotes that a new document is added and the old document is merged
If an update
or upsert
write operation is used, the following additional configurations can be applied:
Specifies the script that needs to be used in order to update the document.
Specifies the script parameters in the paramName:fieldname
or paramName:<CONSTANT>
format. It may be a comma-separated list.
If all parameters are constant, they can be specified in the json
format. Consider the following example:
{ "param1":1, "param2":2 }
Size in bytes for batch writes with the Elasticsearch bulk API. The bulk size is allocated as per the task instance. It means that, if you have five tasks that run with 1mb
batch size, you may have 5mb
of data getting indexed at the same time in Elasticsearch.
Specifies the maximum number of entries in a batch write when you use the Elasticsearch bulk API. When this is used along with es.batch.size.bytes
, when either of these two sizes is reached, the batch update is executed. Again, this setting applies to each task.
This defaults to true
. If a refresh should be executed on the completion of a batch write. This can be very useful when you are interested in analyzing the data being indexed in real time.
Specifies the number of retries for a given batch. The retries are made for rejected data only. A negative value indicates infinite retries.
Indicates the time to wait between two batch write retries.
Defaults depend on whether MapReduce, Cascading, Hive, Pig, Spark, or Storm is used. Specifies the ValueWriter
implementation to convert objects to JSON.