How do you bulk index documents into the default mapping of ElasticSearch? - elasticsearch

The documentation for ElasticSearch 5.5 offers no examples of how to use the bulk operation to index documents into the default mapping of an index. It also gives no indication why this is not possible, unless I'm missing that somewhere else in the documentation.
The ES 5.5 documentation gives one explicit example of bulk indexing:
POST _bulk
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
But it also says that
The endpoints are /_bulk, /{index}/_bulk, and {index}/{type}/_bulk.
When the index or the index/type are provided, they will be used by
default on bulk items that don’t provide them explicitly.
So, the middle endpoint is valid, and it implies to me that a) you have to explicitly provide a type in the metadata for each document indexed, or b) that you can index documents into the default mapping ("_default_").
But I can't get this to work.
I've tried the /myindex/bulk endpoint with no type specified in the metadata.
I've tried it with "_type": "_default_" specified.
I've tried /myindex/_default_/bulk.

This has nothing to do with the _default_ mapping. This is about falling back to the default type that you specify in the URL. You can do the following
POST _bulk
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
However the following snippet is exactly the same
POST /test/type1/_bulk
{ "index" : { "_id" : "1" } }
{ "field1" : "value1" }
And you can mix this
POST foo/bar/_bulk
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
{ "index" : { "_id" : "1" } }
{ "field1" : "value1" }
In this example, one document would be indexed into foo and one into test.
Hope this makes sense.

Related

Using Delete By Query API and Bulk API together in Elastic

I couldn't see any documentation/example about using delete by query api with bulk api in elastic search.
Simply, I want to delete all the documents having same A field and insert many documents just after that. If delete process fails, it shouldn't insert any documents.
e.g.
POST _bulk
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
??? { "delete_by_query???" : { "_index" : "test", "_type" : "type1", "query"... } }
Is there any way to use them together?
Thanks.

ElasticSearch Bulk with ingest plugin

I am using the Attachment Processor Attachment Processor in a Pipeline.
All work fine, but i wanted to do multiple post, then I tried to used bulk API.
Bulk work fine too, but I can't find how to send the url parameter "pipeline=attachment".
this put works :
POST testindex/type1/1?pipeline=attachment
{
"data": "Y291Y291",
"name" : "Marc",
"age" : 23
}
this bulk works :
POST _bulk
{ "index" : { "_index" : "testindex", "_type" : "type1", "_id" : "2" } }
{ "name" : "jean", "age" : 22 }
But how can I index Marc with his data field in bulk to be understood by the pipeline plugin?
thanks to Val comment, I did that and it work fine:
POST _bulk
{ "index" : { "_index" : "testindex", "_type" : "type1", "_id" : "2", "pipeline": "attachment"} } }
{"data": "Y291Y291", "name" : "jean", "age" : 22}

Is the order of operations guaranteed in a bulk update?

I am sending delete and index requests to elasticsearch in bulk (the example is adapted from the docs):
{ "delete" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
The sequence above is intended to first delete a possible document with _id=1, then index a new document with the same _id=1.
Is the order of the actions guaranteed? In other words, for the example above, can I be sure that the delete will not touch the document indexed afterwards (because the order would not be respected for a reason or another)?
The delete operation is useless in this scenario, if you simply index a document with the same ID, it will automatically and implicitly delete/replace the previous document with the same ID.
So if document with ID=1 already exists, simply sending the below command will replace it (read delete and re-index it)
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
According to an Elastic Team Member:
Elasticsearch is distributed and concurrent. We do not guarantee that requests are executed in the order they are received.
https://discuss.elastic.co/t/are-bulk-index-operations-serialized/83770/6

How to upload mysql data to Elasticsearch

I am new to Elasticsearch.
I am trying to upload my existing MySql data to Elasticsearch. Elasticsearch bulk import uses json as the data format. That's why I converted my data to the json format.
employee.json:
[{"EmpId":"101", "Name":"John Doe", "Dept":"IT"}
{"EmpId":"102", "Name":"FooBar", "Dept":"HR"}]
But I am not able to upload my data using the following curl command:
post: curl -XPOST 'localhost:9200/_bulk?pretty' --data-binary #employee.json
I get a parsing exception message.
After reading a document(https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html), I realized that the data format should be something like this:
action_and_meta_data\n
optional_source\n
action_and_meta_data\n
optional_source\n
....
action_and_meta_data\n
optional_source\n
I am still not sure how to format my data in the above format and perform the upload operation.
Basically I want to know the exact data format that is expected by the Elasticsearch bulk upload. And would also like to know whether my curl command is correct.
You data should be in form:
// if you want to use emp id as doc id specify otherwise dont add _id part
{ "index" : { "_index" : "index_name", "_type" : "type_name", "_id" : "101" } }
{"EmpId":"101", "Name":"John Doe", "Dept":"IT"}
{ "index" : { "_index" : "index_name", "_type" : "type_name", "_id" : "102" } }
{"EmpId":"102", "Name":"FooBar", "Dept":"HR"}
....
Or you can use logstash: https://www.elastic.co/blog/logstash-jdbc-input-plugin
From the docs:
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" } }
{ "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_type" : "type1", "_index" : "index1"} }
{ "doc" : {"field2" : "value2"} }
So you would probably want your file to read something like
{ "update" : {"_id" : "101", "_type" : "foo", "_index" : "bar"} }
{"EmpId":"101", "Name":"John Doe", "Dept":"IT"}
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html

Cannot update path in timestamp value

Here is my problem, I'm trying to insert a bunch of data into elastic search and to vizualize it using kibana, however I got an issue with kibana timestamp recognition.
My time field is called "dateStart", and I tried to use it as a timestamp using the following command :
curl -XPUT 'localhost:9200/test/type1/_mapping' -d'{ "type1" :{"_timestamp":{"enabled":true, "format":"yyyy-MM-dd HH:mm:ss","path":"dateStart"}}}'
But this command give me the following error message :
{"error":"MergeMappingException[Merge failed with failures {[Cannot update path in _timestamp value. Value is null path in merged mapping is missing]}]","status":400}
I'm not sure to understand what I do with this command, but what I would like to do is telling to elastic search and kibana to use my "dateStart" field as a timestamp.
Here is a sample of my insert file (I use bulk insert) :
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1"} }
{ "dateStart" : "15-03-31 06:00:00", "score":0.9920092243874442}
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "2"} }
{ "dateStart" : "15-03-23 06:00:00", "score":0.0}
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "3"} }
{ "dateStart" : "15-03-29 12:00:00", "score":0.0}

Resources