Check documents not existing at elasticsearch - elasticsearch

I have millions of indexed documents. after indexing I figured that there is an document count mismatch. i want to send array of hundreds of document ids and search at Elastic search if those document ids exists?. and in response get ids that has not Indexed.
example:
these are indexed documents
[497499, 497550, 498370, 498476, 498639, 498726, 498826, 500479, 500780, 500918]
I'm sending 4 at a time
[497599, 88888, 497550, 77777]
response should be whats not at there
[88888, 77777]

You should consider using the _mget endpoint and then parse the result like for instance :
GET someidx/_mget?_source=false
{
"docs" : [
{
"_id" : "c37m5W4BifZmUly9Ni-X"
},
{
"_id" : "2"
}
]
}
Result :
{
"docs" : [
{
"_index" : "someidx",
"_type" : "_doc",
"_id" : "c37m5W4BifZmUly9Ni-X",
"_version" : 1,
"_seq_no" : 0,
"_primary_term" : 1,
"found" : true
},
{
"_index" : "someidx",
"_type" : "_doc",
"_id" : "2",
"found" : false
}
]
}

Related

How to update value from a document in elasticsearch through Kibana

POST /indexcn/doc/7XYIWHMB6jW2P6mpdcgv/_update
{
"doc" : {
"DELIVERYDATE" : 100
}
}
I am trying to update the DELIVERYDATE from 0 to 100, but I am getting document missing exception.
How to update the document with a new value?
Here is my index :
"hits" : [
{
"_index" : "indexcn",
"_type" : "_doc",
"_id" : "7XYIWHMB6jW2P6mpdcgv",
"_score" : 1.0,
"_source" : {
.......
.......
"DELIVERYDATE" : 0,
}
You actually got the mapping type wrong (doc instead of _doc). Try this and it will work:
fix this
|
v
POST /indexcn/_doc/7XYIWHMB6jW2P6mpdcgv/_update
{
"doc" : {
"DELIVERYDATE" : 100
}
}

Kibana - given an index, how to find saved objects relying on it?

In Kibana I have many dozens of indices.
Given one of them, I want a way to find all the saved objects (searches/dashboards/visualizations) that rely on this index.
Thanks
You can retrieve the document ID of your index pattern and then use that to search your .kibana index
{
"_index" : ".kibana",
"_type" : "index-pattern",
"_id" : "AWBWDmk2MjUJqflLln_o", <---- take this id...
You can use this query on Kibana 5:
GET .kibana/_search?q=AWBWDmk2MjUJqflLln_o <---- ...and use it here
You'll find your visualizations:
{
"_index" : ".kibana",
"_type" : "visualization",
"_id" : "AWBZNJNcMjUJqflLln_s",
"_score" : 6.2450323,
"_source" : {
"title" : "CA groupe",
"visState" : """{"title":"XXX","type":"pie","params":{"addTooltip":true,"addLegend":true,"legendPosition":"right","isDonut":false,"type":"pie"},"aggs":[{"id":"1","enabled":true,"type":"sum","schema":"metric","params":{"field":"XXX","customLabel":"XXX"}},{"id":"2","enabled":true,"type":"terms","schema":"segment","params":{"field":"XXX","size":5,"order":"desc","orderBy":"1","customLabel":"XXX"}}],"listeners":{}}""",
"uiStateJSON" : "{}",
"description" : "",
"version" : 1,
"kibanaSavedObjectMeta" : {
"searchSourceJSON" : """{"index":"AWBWDmk2MjUJqflLln_o","query":{"match_all":{}},"filter":[]}"""
^
|
this is where your index pattern is used
}
}
},

ElasticSearch Bulk with ingest plugin

I am using the Attachment Processor Attachment Processor in a Pipeline.
All work fine, but i wanted to do multiple post, then I tried to used bulk API.
Bulk work fine too, but I can't find how to send the url parameter "pipeline=attachment".
this put works :
POST testindex/type1/1?pipeline=attachment
{
"data": "Y291Y291",
"name" : "Marc",
"age" : 23
}
this bulk works :
POST _bulk
{ "index" : { "_index" : "testindex", "_type" : "type1", "_id" : "2" } }
{ "name" : "jean", "age" : 22 }
But how can I index Marc with his data field in bulk to be understood by the pipeline plugin?
thanks to Val comment, I did that and it work fine:
POST _bulk
{ "index" : { "_index" : "testindex", "_type" : "type1", "_id" : "2", "pipeline": "attachment"} } }
{"data": "Y291Y291", "name" : "jean", "age" : 22}

_mget and _search differences on ElasticSearch

I've indexed 2 documents:
As you can see, after having indexed those ones, I can see them in a search result:
[root#centos7 ~]# curl 'http://ESNode01:9201/living/fuas/_search?pretty'
{
"took" : 20,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2, <<<<<<<<<<<<<<<<
"max_score" : 1.0,
"hits" : [ {
"_index" : "living",
"_type" : "fuas",
"_id" : "idFuaMerge1", <<<<<<<<<<<<<<<
"_score" : 1.0,
"_source":{"timestamp":"2015-10-14T16:13:49.004Z","matter":"null","comment":"null","status":"open","backlogStatus":"unknown","metainfos":[],"resources":[{"resourceId":"idResourceMerge1","noteId":"null"},{"resourceId":"idResourceMerge2","noteId":null}]}
}, {
"_index" : "living",
"_type" : "fuas",
"_id" : "idFuaMerge2", <<<<<<<<<<<<<<<<<<
"_score" : 1.0,
"_source":{"timestamp":"2015-10-14T16:13:49.004Z","matter":"null","comment":"null","status":"open","backlogStatus":"unknown","metainfos":[],"resources":[{"resourceId":"idResourceMerge3","noteId":null}]}
} ]
}
}
After that, I perform a multiget request setting the document ids:
[root#centos7 ~]# curl 'http://ESNode01:9201/living/fuas/_mget?pretty' -d '
{
"ids": ["idFuaMerge1", "idFuaMerge2"]
}
'
{
"docs" : [ {
"_index" : "living",
"_type" : "fuas",
"_id" : "idFuaMerge1",
"found" : false <<<<<<<<<<<<<<<<<<<<!!!!!!!!!!!!!!
}, {
"_index" : "living",
"_type" : "fuas",
"_id" : "idFuaMerge2",
"_version" : 4,
"found" : true, <<<<<<<<<<<<<<<!!!!!!!!!!!!!!!!!
"_source":{"timestamp":"2015-10-14T16:13:49.004Z","matter":"null","comment":"null","status":"open","backlogStatus":"unknown","metainfos":[],"resources":[{"resourceId":"idResourceMerge3","noteId":null}]}
} ]
}
How on earth, on a multiget request, the first document is NOT found and the other one does?
This can only happen if you have used routing key to index your document. Or even parent child relation can also imply the same.
When a document is given for indexing , that document is mapped to a unique shard using the mechanism of routing. In this mechanism the docID is converted to a hash and modulas operation of that hash is taken to determine to which shard the document should go.
So in short
for documentA by default the shard might be 1. Default shard is computed based on routing key.
But then because you applied the routing key yourself , this document is mapped to a different shard , tell 0.
Now when you try to get the document without the routing key , it expects the document to be in shard 1 and not shard 0 and hence your multi get fails as it directly looks in shard 1 to get the document.
The search works because search operation happens across all shards/

Cannot update path in timestamp value

Here is my problem, I'm trying to insert a bunch of data into elastic search and to vizualize it using kibana, however I got an issue with kibana timestamp recognition.
My time field is called "dateStart", and I tried to use it as a timestamp using the following command :
curl -XPUT 'localhost:9200/test/type1/_mapping' -d'{ "type1" :{"_timestamp":{"enabled":true, "format":"yyyy-MM-dd HH:mm:ss","path":"dateStart"}}}'
But this command give me the following error message :
{"error":"MergeMappingException[Merge failed with failures {[Cannot update path in _timestamp value. Value is null path in merged mapping is missing]}]","status":400}
I'm not sure to understand what I do with this command, but what I would like to do is telling to elastic search and kibana to use my "dateStart" field as a timestamp.
Here is a sample of my insert file (I use bulk insert) :
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1"} }
{ "dateStart" : "15-03-31 06:00:00", "score":0.9920092243874442}
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "2"} }
{ "dateStart" : "15-03-23 06:00:00", "score":0.0}
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "3"} }
{ "dateStart" : "15-03-29 12:00:00", "score":0.0}

Resources