How to use _timestamp in a scripted update - elasticsearch

I was trying to come up with an elegant answer to this question and ran into an unexpected problem. The basic idea is to update a document based on its current timestamp. Seems straightforward enough, but I seem to be missing something. At the bottom of the Update API page, the ES docs say:
It also allows to update the ttl of a document using ctx._ttl and timestamp using ctx._timestamp. Note that if the timestamp is not updated and not extracted from the _source it will be set to the update date.
The ES documentation is often enigmatic at best, especially when it comes to scripting, but I took this to mean that I could use the _timestamp field in an update script.
So I set up a simple index with a timestamp:
PUT /test_index
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"doc": {
"_timestamp": {
"enabled": true,
"store": true,
"path": "doc_date",
"format" : "YYYY-MM-dd"
},
"properties": {
"doc_date": {
"type": "date",
"format" : "YYYY-MM-dd"
},
"doc_text": {
"type": "string"
}
}
}
}
}
and added some docs:
POST /test_index/_bulk
{"index":{"_index":"test_index","_type":"doc","_id":1}}
{"doc_text":"doc1", "doc_date":"2015-2-5"}
{"index":{"_index":"test_index","_type":"doc","_id":2}}
{"doc_text":"doc2", "doc_date":"2015-2-10"}
{"index":{"_index":"test_index","_type":"doc","_id":3}}
{"doc_text":"doc3", "doc_date":"2015-2-15"}
If I query for the first doc, I get back what I expect:
POST /test_index/_search
{
"query": {
"match": {
"doc_text": "doc1"
}
},
"fields": [
"_timestamp",
"_source"
]
}
...
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1.4054651,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 1.4054651,
"_source": {
"doc_text": "doc1",
"doc_date": "2015-2-5"
},
"fields": {
"_timestamp": 1423094400000
}
}
]
}
}
So far so good. Now I want to conditionally update the first doc, based on its timestamp. First I tried this, and got an error:
POST /test_index/doc/1/_update
{
"script": "if(ctx._timestamp < new_ts){ctx._source.doc_date=new_date;ctx._source.doc_text=new_text}",
"params": {
"new_ts": 1423526400000,
"new_date": "2015-2-10",
"new_text": "doc1-updated"
}
}
...
{
"error": "ElasticsearchIllegalArgumentException[failed to execute script]; nested: PropertyAccessException[[Error: could not access: _timestamp; in class: java.util.HashMap]\n[Near : {... if(ctx._timestamp < new_ts){ctx._ ....}]\n ^\n[Line: 1, Column: 4]]; ",
"status": 400
}
Then I tried this:
POST /test_index/doc/1/_update
{
"script": "if(ctx[\"_timestamp\"] < new_ts){ctx._source.doc_date=new_date;ctx._source.doc_text=new_text}",
"params": {
"new_ts": 1423526400000,
"new_date": "2015-2-10",
"new_text": "doc1-updated"
}
}
...
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_version": 2
}
I didn't get an error, but the update didn't happen:
POST /test_index/_search
{
"query": {
"match": {
"doc_text": "doc1"
}
},
"fields": [
"_timestamp",
"_source"
]
}
...
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1.287682,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 1.287682,
"_source": {
"doc_text": "doc1",
"doc_date": "2015-2-5"
},
"fields": {
"_timestamp": 1423094400000
}
}
]
}
}
Just out of curiosity, I inverted the conditional:
POST /test_index/doc/1/_update
{
"script": "if(ctx[\"_timestamp\"] > new_ts){ctx._source.doc_date=new_date;ctx._source.doc_text=new_text}",
"params": {
"new_ts": 1423526400000,
"new_date": "2015-2-10",
"new_text": "doc1-updated"
}
}
with the same result: no update.
Okay, so as a sanity check I tried to set the timestamp, and got an error:
POST /test_index/doc/1/_update
{
"script": "ctx._source.doc_date=new_date;ctx._source.doc_text=new_text;ctx._timestamp=new_ts",
"params": {
"new_ts": 1423526400000,
"new_date": "2015-2-10",
"new_text": "doc1-updated"
}
}
...
{
"error": "ClassCastException[java.lang.Long cannot be cast to java.lang.String]",
"status": 500
}
I also tried it with "ctx[\"_timestamp\"]=new_ts;", and got the same error.
So it seems that the _timestamp field is not available to the script, even though the documentation says it is. What am I doing wrong?
I also tried updating without the conditional or updating the timestamp, and it worked as expected.
I used Elasticsearch version 1.3.4 (with dynamic scripting enabled, obviously), running on an Ubuntu 12 VM.
Here is the code I used to set this up:
http://sense.qbox.io/gist/ca2b3c6b84572e5f87d57d22f8c38252fa4ee216

Related

how to make proper query to select by ID and later update using elastic search?

I am very new in ES and I am trying to figure out some things.
I did a basic query this way
GET _search
{
"query": {
"match_all": {}
}
}
and I got this...
{
"took": 7,
"timed_out": false,
"_shards": {
"total": 768,
"successful": 768,
"failed": 0
},
"hits": {
"total": 456,
"max_score": 1,
"hits": [
{
"_index": "sometype_1",
"_type": "sometype",
"_id": "12312321312312",
"_score": 1,
"_source": {
"readModel": {
"id": "asdfqwerzcxv",
"status": "active",
"hidden": false
},
"model": {
"id": "asdfqwerzcxv",
"content": {
"objectId": "421421312312",
"message": "hello world",
..... //the rest of the object...
So right now I want to get the object with id asdfqwerzcxv and I did this:
GET _search
{
"query": {
"match" : {
"id" :"asdfqwerzcxv"
}
}
}
But of course is not working... I also tried to make the whole route like:
GET _search
{
"query": {
"match" : {
"_source" :{
"readModel" : {
"id": "asdfqwerzcxv"
}
}
}
}
}
But no luck...
is there a way to do this? could someone help me?
Thanks
You need to use the full-qualified field name, try this:
GET _search
{
"query": {
"match" : {
"readModel.id" :"asdfqwerzcxv"
^
|
add this
}
}
}

Elasticsearch OR query with nested objects returns inner_hits not matching the criteria

I'm getting weird results when querying nested objects. Imagine the following structure:
{ owner.name = "fred",
...,
pets [
{ name = "daisy", ... },
{ name = "flopsy", ... }
]
}
If I only have the document shown above, and I search pets matching this criteria:
pets.name = "daisy" OR
(owner.name = "julie" and pet.name = "flopsy")
I would expect to only get one result ("daisy"), but I'm getting both pet names.
This is one way to reproduce this:
# Create nested mapping
PUT pet-owners
{
"mappings": {
"animals": {
"properties": {
"owner": {"type": "text"},
"pets": {
"type": "nested",
"properties": {
"name": {"type": "text", "fielddata": true}
}
}
}
}
}
}
# Insert nested object
PUT pet-owners/animals/1?op_type=create
{
"owner" : "fred",
"pets" : [
{ "name" : "daisy"},
{ "name" : "flopsy"}
]
}
# Query
GET pet-owners/_search
{ "from": 0, "size": 50,
"query": {
"constant_score": {
"filter": { "bool": {"must": [
{"bool": {"should": [
{"nested": {"query":
{"term": {"pets.name": "daisy"}},
"path":"pets",
"inner_hits": {
"name": "pets_hits_1",
"size": 99,
"_source": false,
"docvalue_fields": ["pets.name"]
}
}},
{"bool": {"must": [
{"term": {"owner": "julie"}},
{"nested": {"query":
{"term": {"pets.name": "flopsy"}},
"path":"pets",
"inner_hits": {
"name": "pets_hits_2",
"size": 99,
"_source": false,
"docvalue_fields": ["pets.name"]
}
}}
]}}
]}}
]}}}},
"_source": false
}
The query returns both pets names (as opposed to the expected one).
Is this behavior normal? Am I doing something wrong, or my reasoning about the nested structure or the query behavior is flawed?
Any help or guidance will be much appreciated.
I'm running this query under ElasticSearch 6.3.x
EDIT: I'm adding the response received, to better illustrate the case
{
"took": 16,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "pet-owners",
"_type": "animals",
"_id": "1",
"_score": 1,
"inner_hits": {
"pets_hits_1": {
"hits": {
"total": 1,
"max_score": 0.6931472,
"hits": [
{
"_index": "pet-owners",
"_type": "animals",
"_id": "1",
"_nested": {
"field": "pets",
"offset": 0
},
"_score": 0.6931472,
"fields": {
"pets.name": [
"daisy"
]
}
}
]
}
},
"pets_hits_2": {
"hits": {
"total": 1,
"max_score": 0.6931472,
"hits": [
{
"_index": "pet-owners",
"_type": "animals",
"_id": "1",
"_nested": {
"field": "pets",
"offset": 1
},
"_score": 0.6931472,
"fields": {
"pets.name": [
"flopsy"
]
}
}
]
}
}
}
}
]
}
}
So we can see that it's not that the query matches and returns the whole existing document, but that it returns each of the pets independently, one inside each of the inner_hits. It's this result that's surprising to me.
(edited) - in summary this issue is around the context of the 'inner_hits':
It looks like the inner_hits 'pets_hits_2' is returning a match because it is belonging to the nested query that simply searches the pets field for 'flopsy'.
As an independent query on our single document, that is a valid hit.
However, because that query is within a list of bool/must queries, where other queries will not match on our document, you may well expect that the inner_hits should pick up on this and therefore not return a hit.
I haven't been able to find any docs to clarify whether this is intentional behaviour or not - might be worth raising with elastic ...

Get specific fields from index in elasticsearch

I have an index in elastic-search.
Sample structure :
{
"Article": "Article7645674712",
"Genre": "Genre92231455",
"relationDesc": [
"Article",
"Genre"
],
"org": "user",
"dateCreated": {
"date": "08/05/2015",
"time": "16:22 IST"
},
"dateModified": "08/05/2015"
}
From this index i want to retrieve selected fields: org and dateModified.
I want result like this
{
"took": 265,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 28,
"max_score": 1,
"hits": [
{
"_index": "couchrecords",
"_type": "couchbaseDocument",
"_id": "3",
"_score": 1,
"_source": {
"doc": {
"org": "user",
"dateModified": "08/05/2015"
}
}
},
{
"_index": "couchrecords",
"_type": "couchbaseDocument",
"_id": "4",
"_score": 1,
"_source": {
"doc": {
"org": "user",
"dateModified": "10/05/2015"
}
}
}
]
}
}
How to query elastic-search to get only selected specific fields ?
You can retrieve only a specific set of fields in the result hits using the _source parameter like this:
curl -XGET localhost:9200/couchrecords/couchbaseDocument/_search?_source=org,dateModified
Or in this format:
curl -XPOST localhost:9200/couchrecords/couchbaseDocument/_search -d '{
"_source": ["doc.org", "doc.dateModified"], <---- you just need to add this
"query": {
"match_all":{} <----- or whatever query you have
}
}'
That's easy. Considering any query of this format :
{
"query": {
...
},
}
You'll just need to add the fields field into your query which in your case will result in the following :
{
"query": {
...
},
"fields" : ["org","dateModified"]
}
{
"_source" : ["org","dateModified"],
"query": {
...
}
}
Check ElasticSearch source filtering.

Elasticsearch: get multiple specified documents in one request?

I am new to Elasticsearch and hope to know whether this is possible.
Basically, I have the values in the "code" property for multiple documents. Each document has a unique value in this property. Now I have the codes of multiple documents and hope to retrieve them in one request by supplying multiple codes.
Is this doable in Elasticsearch?
Regards.
Edit
This is the mapping of the field:
"code" : { "type" : "string", "store": "yes", "index": "not_analyzed"},
Two example values of this property:
0Qr7EjzE943Q
GsPVbMMbVr4s
What is the ES syntax to retrieve the two documents in ONE request?
First, you probably don't want "store":"yes" in your mapping, unless you have _source disabled (see this post).
So, I created a simple index like this:
PUT /test_index
{
"mappings": {
"doc": {
"properties": {
"code": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
added the two docs with the bulk API:
POST /test_index/_bulk
{"index":{"_index":"test_index","_type":"doc","_id":1}}
{"code":"0Qr7EjzE943Q"}
{"index":{"_index":"test_index","_type":"doc","_id":2}}
{"code":"GsPVbMMbVr4s"}
There are a number of ways I could retrieve those two documents. The most straightforward, especially since the field isn't analyzed, is probably a with terms query:
POST /test_index/_search
{
"query": {
"terms": {
"code": [
"0Qr7EjzE943Q",
"GsPVbMMbVr4s"
]
}
}
}
both documents are returned:
{
"took": 21,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.04500804,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 0.04500804,
"_source": {
"code": "0Qr7EjzE943Q"
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "2",
"_score": 0.04500804,
"_source": {
"code": "GsPVbMMbVr4s"
}
}
]
}
}
Here is the code I used:
http://sense.qbox.io/gist/a3e3e4f05753268086a530b06148c4552bfce324

Elastic Search- Fetch Distinct Tags

I have document of following format:
{
_id :"1",
tags:["guava","apple","mango", "banana", "gulmohar"]
}
{
_id:"2",
tags: ["orange","guava", "mango shakes", "apple pie", "grammar"]
}
{
_id:"3",
tags: ["apple","grapes", "water", "gulmohar","water-melon", "green"]
}
Now, I want to fetch unique tags value from whole document 'tags field' starting with prefix g*, so that these unique tags will be display by tag suggestors(Stackoverflow site is an example).
For example: Whenever user types, 'g':
"guava", "gulmohar", "grammar", "grapes" and "green" should be returned as a result.
ie. the query should returns distinct tags with prefix g*.
I tried everywhere, browse whole documentations, searched es forum, but I didn't find any clue, much to my dismay.
I tried aggregations, but aggregations returns the distinct count for whole words/token in tags field. It does not return the unique list of tags starting with 'g'.
"query": {
"filtered": {
"query": {
"bool": {
"should": [
{
"query_string": {
"allow_leading_wildcard": false,
"fields": [
"tags"
],
"query": "g*",
"fuzziness":0
}
}
]
}
},
"filter": {
//some condition on other field...
}
}
},
"aggs": {
"distinct_tags": {
"terms": {
"field": "tags",
"size": 10
}
}
},
result of above: guava(w), apple(q), mango(1),...
Can someone please suggest me the correct way to fetch all the distinct tags with prefix input_prefix*?
It's a bit of a hack, but this seems to accomplish what you want.
I created an index and added your docs:
DELETE /test_index
PUT /test_index
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
}
}
POST /test_index/_bulk
{"index":{"_index":"test_index","_type":"doc","_id":1}}
{"tags":["guava","apple","mango", "banana", "gulmohar"]}
{"index":{"_index":"test_index","_type":"doc","_id":2}}
{"tags": ["orange","guava", "mango shakes", "apple pie", "grammar"]}
{"index":{"_index":"test_index","_type":"doc","_id":3}}
{"tags": ["guava","apple","grapes", "water", "grammar","gulmohar","water-melon", "green"]}
Then I used a combination of prefix query and highlighting as follows:
POST /test_index/_search
{
"query": {
"prefix": {
"tags": {
"value": "g"
}
}
},
"fields": [ ],
"highlight": {
"pre_tags": [""],
"post_tags": [""],
"fields": {
"tags": {}
}
}
}
...
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 1,
"highlight": {
"tags": [
"guava",
"gulmohar"
]
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "2",
"_score": 1,
"highlight": {
"tags": [
"guava",
"grammar"
]
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "3",
"_score": 1,
"highlight": {
"tags": [
"guava",
"grapes",
"grammar",
"gulmohar",
"green"
]
}
}
]
}
}
Here is the code I used:
http://sense.qbox.io/gist/c14675ee8bd3934389a6cb0c85ff57621a17bf11
What you're trying to do amounts to autocomplete, of course, and there are perhaps better ways of going about that than what I posted above (though they are a bit more involved). Here are a couple of blog posts we did about ways to set up autocomplete:
http://blog.qbox.io/quick-and-dirty-autocomplete-with-elasticsearch-completion-suggest
http://blog.qbox.io/multi-field-partial-word-autocomplete-in-elasticsearch-using-ngrams
As per #Sloan Ahrens advice, I did following:
Updated the mapping:
"tags": {
"type": "completion",
"context": {
"filter_color": {
"type": "category",
"default": "",
"path": "fruits.color"
},
"filter_type": {
"type": "category",
"default": "",
"path": "fruits.type"
}
}
}
Reference: ES API Guide
Inserted these indexes:
{
_id :"1",
tags:{input" :["guava","apple","mango", "banana", "gulmohar"]},
fruits:{color:'bar',type:'alice'}
}
{
_id:"2",
tags:{["orange","guava", "mango shakes", "apple pie", "grammar"]}
fruits:{color:'foo',type:'bob'}
}
{
_id:"3",
tags:{ ["apple","grapes", "water", "gulmohar","water-melon", "green"]}
fruits:{color:'foo',type:'alice'}
}
I don't need to modify much, my original index. Just added input before tags array.
POST rescu1/_suggest?pretty'
{
"suggest": {
"text": "g",
"completion": {
"field": "tags",
"size": 10,
"context": {
"filter_color": "bar",
"filter_type": "alice"
}
}
}
}
gave me the desired output.
I accepted #Sloan Ahrens answer as his suggestions worked like a charm for me, and he showed me the right direction.

Resources