Entries of ElasticSearch hits - elasticsearch

I am performing an ElasticSearch query through
curl -XGET 'http://localhost:9200/_search?pretty' -d '{\"_source\":[\"data.js_event\", \"data.timestamp\",\"data.uid\",\"data.element\"],\"query\":{\"match\":{\"event\":\"first_time_user_event\"}}}'
I am ONLY interested in the ouput of _source but the retrieval leads to something like
{"took" : 46,
"timed_out" : false,
"_shards" : {
"total" : 71,
"successful" : 71,
"failed" : 0
},
"hits" : {
"total" : 2062326,
"max_score" : 4.8918204,
"hits" : [ {
"_index" : "logstash-2015.11.22",
"_type" : "fluentd",
"_id" : "AVEv_blDT1yShMIEDDmv",
"_score" : 4.8918204,
"_source":{"data":{"js_event":"leave_page","timestamp":"2015-11-22T16:18:47.088Z","uid":"l4Eys1T/rAETpysn7E/Jog==","element":"body"}}
}, {
"_index" : "logstash-2015.11.21",
"_type" : "fluentd",
"_id" : "AVEnZa5nT1yShMIEDDW8",
"_score" : 4.0081544,
"_source":{"data":{"js_event":"hover","timestamp":"2015-11-21T00:15:15.097Z","uid":"E/4Fdl5uETvhQeX/FZIWfQ==","element":"infographic-new-format-selector"}}
},
...
Is there a way to get rid of _index, _type, _id and _score? The reason is that the query is performed on a remote server and I would like to limit the size of downloaded data.
Thanks.

Yes, you can use response filtering (only available since ES 1.7) by specifying what you want in the response (i.e. filter_path=hits.hits._source) and ES will filter it out for you:
curl -XGET 'http://localhost:9200/_search?filter_path=hits.hits._source&pretty' -d '{\"_source\":[\"data.js_event\", \"data.timestamp\",\"data.uid\",\"data.element\"],\"query\":{\"match\":{\"event\":\"first_time_user_event\"}}}'

Related

Detect changes during bulk indexing

We are using Elasticsearch v5.6.12 for our database. We update this frequently using the bulk REST api. Some of the time the individual requests won't change anything (i.e. the value of the document that Elasticsearch is already up to date). How can I detect these instances?
I saw this (https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html) but I'm not sure it's applicable in our situation.
You can use the noop detection when checking the result of your bulk queries.
When the bulk query returns, you can iterate over each update result and check if the result field has a value of noop (vs updated)
# Say the document is indexed
PUT test/doc/1
{
"test": "123"
}
# Now you want to bulk update it
POST test/doc/_bulk
{"update":{"_id": "1"}}
{"doc":{"test":"123"}} <-- this will yield `result: noop`
{"update":{"_id": "1"}}
{"doc":{"test":"1234"}} <-- this will yield `result: updated`
{"update":{"_id": "2"}}
{"doc":{"test":"3456"}, "doc_as_upsert": true} <-- this will yield `result: created`
Result:
{
"took" : 6,
"errors" : false,
"items" : [
{
"update" : {
"_index" : "test",
"_type" : "doc",
"_id" : "1",
"_version" : 2,
"result" : "noop", <-- see "noop"
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"status" : 200
}
},
{
"update" : {
"_index" : "test",
"_type" : "doc",
"_id" : "1",
"_version" : 3,
"result" : "updated", <-- see "updated"
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 2,
"_primary_term" : 1,
"status" : 200
}
},
{
"_index" : "test",
"_type" : "doc",
"_id" : "2",
"_version" : 1,
"result" : "created", <-- see "created"
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1
}
]
}
As you can see, when specifying doc_as_upsert: true for document with id 2, the document will be created and the result field value will be created

Elasticsearch has_child not returning back all parent documents

Here is the mapping data for both customer and customer_query documents, where customer is the parent and customer_query the child document.
When I run a generic search against all customer_query documents, I get back 127 documents.
However, when I run the following query against the parent
curl -XGET "http://localhost:9200/fts_index/customer/_search" -d'
{
"query": {
"has_child" : {
"type" : "customer_query",
"query" : { "match_all": {} }
}
}
}
}'
I get back only 23 documents. There should be 127 documents returned back since each customer_query document has a unique parent id assigned to it that does match up to the customer type.
When I retry creating my customer_query documents, I get a different number of documents back each time leading me to think it is some kind of shard issue. I have 5 shards assigned to the index.
{
"took" : 59,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 23,
"max_score" : 1.0,
"hits" : [ {
"_index" : "fts_index",
"_type" : "customer",
"_id" : "7579f2c0-e4e4-4374-82d7-bf4c508fc51d",
"_score" : 1.0,
"_routing" : "8754248f-1c51-46bf-970a-493c349c70a7",
"_parent" : "8754248f-1c51-46bf-970a-493c349c70a7",
....
I can't wrap my head around this issue. Any thoughts on what could be the issue? Is this a routing issue? If so, how do I rectify that with my search?

_mget and _search differences on ElasticSearch

I've indexed 2 documents:
As you can see, after having indexed those ones, I can see them in a search result:
[root#centos7 ~]# curl 'http://ESNode01:9201/living/fuas/_search?pretty'
{
"took" : 20,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2, <<<<<<<<<<<<<<<<
"max_score" : 1.0,
"hits" : [ {
"_index" : "living",
"_type" : "fuas",
"_id" : "idFuaMerge1", <<<<<<<<<<<<<<<
"_score" : 1.0,
"_source":{"timestamp":"2015-10-14T16:13:49.004Z","matter":"null","comment":"null","status":"open","backlogStatus":"unknown","metainfos":[],"resources":[{"resourceId":"idResourceMerge1","noteId":"null"},{"resourceId":"idResourceMerge2","noteId":null}]}
}, {
"_index" : "living",
"_type" : "fuas",
"_id" : "idFuaMerge2", <<<<<<<<<<<<<<<<<<
"_score" : 1.0,
"_source":{"timestamp":"2015-10-14T16:13:49.004Z","matter":"null","comment":"null","status":"open","backlogStatus":"unknown","metainfos":[],"resources":[{"resourceId":"idResourceMerge3","noteId":null}]}
} ]
}
}
After that, I perform a multiget request setting the document ids:
[root#centos7 ~]# curl 'http://ESNode01:9201/living/fuas/_mget?pretty' -d '
{
"ids": ["idFuaMerge1", "idFuaMerge2"]
}
'
{
"docs" : [ {
"_index" : "living",
"_type" : "fuas",
"_id" : "idFuaMerge1",
"found" : false <<<<<<<<<<<<<<<<<<<<!!!!!!!!!!!!!!
}, {
"_index" : "living",
"_type" : "fuas",
"_id" : "idFuaMerge2",
"_version" : 4,
"found" : true, <<<<<<<<<<<<<<<!!!!!!!!!!!!!!!!!
"_source":{"timestamp":"2015-10-14T16:13:49.004Z","matter":"null","comment":"null","status":"open","backlogStatus":"unknown","metainfos":[],"resources":[{"resourceId":"idResourceMerge3","noteId":null}]}
} ]
}
How on earth, on a multiget request, the first document is NOT found and the other one does?
This can only happen if you have used routing key to index your document. Or even parent child relation can also imply the same.
When a document is given for indexing , that document is mapped to a unique shard using the mechanism of routing. In this mechanism the docID is converted to a hash and modulas operation of that hash is taken to determine to which shard the document should go.
So in short
for documentA by default the shard might be 1. Default shard is computed based on routing key.
But then because you applied the routing key yourself , this document is mapped to a different shard , tell 0.
Now when you try to get the document without the routing key , it expects the document to be in shard 1 and not shard 0 and hence your multi get fails as it directly looks in shard 1 to get the document.
The search works because search operation happens across all shards/

Elasticsearch CouchDB River no hit

I have a problem with CouchDB and Elasticsearch. i use Docker to realise it. i have a working couchdb container on the default port. Now i use this container:
registry.hub.docker.com/u/jeko/elasticsearch-river-couchdb/
And i insert a new couchdb connection with this:
curl -X PUT '127.0.0.1:9200/_river/testdb/_meta' -d ' { "type" : "couchdb", "couchdb" : { "host" : "couchdb", "port" : 5984, "db" : "articles", "filter" : null }, "index" : { "index" : "articles", "type" : "articles", "bulk_size" : "100", "bulk_timeout" : "10ms" } }'
to have a working elasticsearch with the couchdb river. Now i checked with curl host/articles/articles/_search?pretty=true the documents. The Hits are empty.
{
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
i turned the debugger on and checked the logging file. The output is this: http://pastebin.com/ETkNmJzT
The only conspicuous thing i found is this line: [2015-02-20 14:04:24,554][DEBUG][plugins ] [Arc] [/elasticsearch/plugins/river-couchdb/_site] directory does not exist.
But i doesn't understand why it doesn't work. i can curl the IP

Unable to search attachment type field in an ElasticSearch indexed document

Search does not return any results although I do have a document that should match the query.
I do have the ElasticSearch mapper-attachments plugin installed per https://github.com/elasticsearch/elasticsearch-mapper-attachments. I have also googled the topic as well as browsed similar questions in stack overflow, but have not found an answer.
Here's what I typed into a windows 7 command prompt:
c:\Java\elasticsearch-1.3.4>curl -XDELETE localhost:9200/tce
{"acknowledged":true}
c:\Java\elasticsearch-1.3.4>curl -XPUT localhost:9200/tce
{"acknowledged":true}
c:\Java\elasticsearch-1.3.4>curl -XPUT localhost:9200/tce/contact/_mapping -d{\"
contact\":{\"properties\":{\"my_attachment\":{\"type\":\"attachment\"}}}}
{"acknowledged":true}
c:\Java\elasticsearch-1.3.4>curl -XPUT localhost:9200/tce/contact/1 -d{\"my_atta
chment\":\"SGVsbG8=\"}
{"_index":"tce","_type":"contact","_id":"1","_version":1,"created":true}
c:\Java\elasticsearch-1.3.4>curl localhost:9200/tce/contact/_search?pretty
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "tce",
"_type" : "contact",
"_id" : "1",
"_score" : 1.0,
"_source":{"my_attachment":"SGVsbG8="}
} ]
}
}
c:\Java\elasticsearch-1.3.4>curl localhost:9200/tce/contact/_search?pretty -d{\"
query\":{\"term\":{\"my_attachment\":\"Hello\"}}}
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
Note that the base64 encoded value of "Hello" is "SGVsbG8=", which is the value I have inserted into the "my_attachment" field of the document.
I am assuming that the mapper-attachments plugin has been deployed correctly because I don't get an error executing the mapping command above.
Any help would be greatly appreciated.
What analyzer is running against the my_attachment field?
if it's the standard analyser (can't see any listed) then the Hello in the text will be made lowercase in the index.
i.e. when doing a term search (which doesn't have an analyzer on it) - try searching for hello
curl localhost:9200/tce/contact/_search?pretty -d'
{"query":
{"term":
{"my_attachment":"hello"
}}}'
you can also see which terms have been added to the index:
curl 'http://localhost:9200/tce/contact/_search?pretty=true' -d '{
"query" : {
"match_all" : { }
},
"script_fields": {
"terms" : {
"script": "doc[field].values",
"params": {
"field": "my_attachment"
}
}
}
}'

Resources