Updating a property of an object in a list in ElasticSearch document? - elasticsearch

I'm fairly new to ElasticSearch. I'm using it in a .NET project and I'm using NEST client. Right now I'm examining ways of handling document updates.
I have a document that looks like this:
public class Class1
{
public string Prop1 { get; set; }
public string Prop2 { get; set; }
public List<Class2> PropList { get; set; }
}
and when I want to add something to the PropList, I'm doing it by script:
client.Update<Class1>(x => x
.Id(1)
.Index("index_name")
.Script("ctx._source.propList += prop")
.Params(p => p.Add("prop", newProp)));
That works perfectly right now. The problem is when I want to update a property of an object inside propList. The way I'm doing it right now is by retrieving the entire document, finding the object in the list, update the property and index the entire document again, which at some point can result in performance issues.
Is there a way of doing this more efficiently? Maybe using scripts or some other way?
Thanks.

I don't know how to set it up with nest, off-hand, but I would go with a parent/child relationship.
As a toy example, I set up an index with this mapping:
PUT /test_index
{
"mappings": {
"parent_type": {
"properties": {
"num_prop": {
"type": "integer"
},
"str_prop": {
"type": "string"
}
}
},
"child_type": {
"_parent": {
"type": "parent_type"
},
"properties": {
"child_num": {
"type": "integer"
},
"child_str": {
"type": "string"
}
}
}
}
}
then added some data:
POST /test_index/_bulk
{"index":{"_type":"parent_type","_id":1}}
{"num_prop":1,"str_prop":"hello"}
{"index":{"_type":"child_type","_id":1,"_parent":1}}
{"child_num":11,"child_str":"foo"}
{"index":{"_type":"child_type","_id":2,"_parent":1}}
{"child_num":12,"child_str":"bar"}
{"index":{"_type":"parent_type","_id":2}}
{"num_prop":2,"str_prop":"goodbye"}
{"index":{"_type":"child_type","_id":3,"_parent":2}}
{"child_num":21,"child_str":"baz"}
Now if I want to update a child document I can just post a new version:
POST /test_index/child_type/2?parent=1
{
"child_num": 13,
"child_str": "bars"
}
(note that I have to provide the parent id so ES can route the request appropriately)
I can also do a partial, scripted update if I want to:
POST /test_index/child_type/3/_update?parent=2
{
"script": "ctx._source.child_num+=1"
}
We can prove that this worked by searching the child types:
POST /test_index/child_type/_search
...
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "child_type",
"_id": "1",
"_score": 1,
"_source": {
"child_num": 11,
"child_str": "foo"
}
},
{
"_index": "test_index",
"_type": "child_type",
"_id": "2",
"_score": 1,
"_source": {
"child_num": 13,
"child_str": "bars"
}
},
{
"_index": "test_index",
"_type": "child_type",
"_id": "3",
"_score": 1,
"_source": {
"child_num": 22,
"child_str": "baz"
}
}
]
}
}
Hope this helps. Here is the code I used, plus a few more examples:
http://sense.qbox.io/gist/73f6d2f347a08bfe0c254a977a4a05a68d2f3a8d

Related

ElasticSearch Range query

I have created the index by using the following mapping:
put test1
{
"mappings": {
"type1": {
"properties": {
"age": {
"type": "text",
"fields": {
"raw": {
"type": "keyword",
"ignore_above": 32766
}
}
}
}
}
}
}
Added following documents into index:
PUT test1/type1/1/_create
{
"age":50
}
PUT test1/type1/2/_create
{
"age":100
}
PUT test1/type1/3/_create
{
"age":150
}
PUT test1/type1/4/_create
{
"age":200
}
I have used the following range query to fetch result:
GET test1/_search
{
"query": {
"range" : {
"age" : {
"lte" : 150
}
}
}
}
It is giving me the following response :
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "test1",
"_type": "type1",
"_id": "2",
"_score": 1,
"_source": {
"age": 100
}
},
{
"_index": "test1",
"_type": "type1",
"_id": "3",
"_score": 1,
"_source": {
"age": 150
}
}
]
}
}
the above response not showing document having age is 50 it is showing only age is 100 and 150. As 50 also less than 200. What is wrong here?
Can anyone help me to get a valid result?
In my schema age field type text, I don't want to change it.
How can I get a valid result?
Because age field type is text, the range query is using alphabetically order. So the results are correct:
"100"<"150"
"150"="150"
"50">"150"
If you are ingesting only numbers in age field, you should change the age field type to number, or add another inner field as number, just you did with raw inner field.
UPDATE: Tested on local system and it is working.
NOTE: Ideally, you would want the mappings to be correct, but if there is no other choice and you are not the person to decide on the mapping then you can still achieve it by following.
For ES version 6.3 onwards, try this.
GET test1/type1/_search
{
"query": {
"bool" : {
"must" : {
"script" : {
"script" : {
"source": "Integer.parseInt(doc['age.raw'].value) <= 150",
"lang": "painless"
}
}
}
}
}
}
Sources to refer:
https://www.elastic.co/guide/en/elasticsearch/reference/6.3/query-dsl-script-query.html
https://discuss.elastic.co/t/painscript-script-cast-string-as-int/97034
Type for your field age in mapping is set to text. That is reason it is doing dictionary sorting where 50 > 150. Please use long data type. https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html

ElasticSearch: Grandchild/child/parent relations not working properly

I'm facing some odd behavior of elastic-search while searching grand child. My grand child doesn't recognizes each n every parent document. When I ask elastic-search to return me children of a parent, it returns all the possible hits. Then when i ask to return me those children which have grand child, then I get incorrect results. Some time i get no hits or lesser. But when i check the routing and parent id of my grand child then I found that they do exists in their parent. But I can't understand why I'm getting incorrect results. Do anybody of you has encountered such types of issues???
I checked my code thrice and didn't found any type error :-(
Let me show you the steps to reproduce this error.
Here is my mapping:
PUT /test_index
{
"mappings":{
"parentDoc":{
"properties":{
"id":{
"type":"integer"
},
"name":{
"type":"text"
}
}
},
"childDoc": {
"_parent": {
"type": "parentDoc"
},
"properties":{
"id":{
"type":"integer"
},
"name":{
"type":"text"
},
"contact": {
"type":"text"
}
}
},
"grandChildDoc": {
"_parent": {
"type": "childDoc"
},
"properties":{
"id":{
"type":"integer"
},
"description":{
"type":"text"
}
}
}
}
}
Indexing parentDoc:
PUT /test_index/parentDoc/1
{
"pdId":1,
"name": "First parentDoc"
}
PUT /test_index/parentDoc/2
{
"pdId":2,
"name": "Second parentDoc"
}
Indexing childDoc:
PUT /test_index/childDoc/10?parent=1
{
"cdId":10,
"name": "First childDoc",
"contact" : "+XX0000000000"
}
PUT /test_index/childDoc/101?parent=1
{
"cdId":101,
"name": "Second childDoc",
"contact" : "+XX0000000111"
}
PUT /test_index/childDoc/20?parent=2
{
"cdId":20,
"name": "Third childDoc",
"contact" : "+XX0011100000"
}
Indexing grandChildDoc:
PUT /test_index/grandChildDoc/100?parent=10
{
"gcdId":100,
"name": "First grandChildDoc"
}
PUT /test_index/grandChildDoc/200?parent=10
{
"gcdId":200,
"name": "Second grandChildDoc"
}
PUT /test_index/grandChildDoc/300?parent=20
{
"gcdId":300,
"name": "Third grandChildDoc"
}
Now when I ask elastic-search to show me those parentDoc which have childDoc, then it returns:
POST /test_index/parentDoc/_search
{
"query": {
"has_child": {
"type": "childDoc",
"query": {
"match_all": {}
}
}
}
}
Result: (This seems fine.!)
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "parentDoc",
"_id": "2",
"_score": 1,
"_source": {
"pdId": 2,
"name": "Second parentDoc"
}
},
{
"_index": "test_index",
"_type": "parentDoc",
"_id": "1",
"_score": 1,
"_source": {
"pdId": 1,
"name": "First parentDoc"
}
}
]
}
}
Now when I ask elasticsearch to show me those childDoc which have grandChildDoc, then it returns:
POST /test_index/childDoc/_search
{
"query": {
"has_child": {
"type": "grandChildDoc",
"query": {
"match_all": {}
}
}
}
}
Result: (Here, you will notice that some of the hits are missing. For example childDoc with id 10 and 101 are missing).
{
"took": 7,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "childDoc",
"_id": "20",
"_score": 1,
"_routing": "2",
"_parent": "2",
"_source": {
"cdId": 20,
"name": "Third childDoc",
"contact": "+XX0011100000"
}
}
]
}
}
Any idea what mistake I'm doing??? Or it is a bug ??? Any workaround or solution???
[Note: I'm using elasticsearch v5.4]
I have got the same working. I am using logstash to index the documents in elastic.
Root Cause:
I have explored the root cause. By default elastic assigns 5 shards and documents for one set of parent-child-grandchild must be located in the same shard. Unfortunately the data is spread across the shards. Elastic will return only those records which are there in the same shard.
Solution:
For parent-child-grandchild to work, you need to have the grand parent document id as routing value in grand child document.
For single level(Parent-child), parent value is deafult routing value which works fine. But for three level, you need to configure routing for each document in grand child.
As I have mentioned, routing value should be grand parent id.
Please find below example using logstash:
Parent
"index" => "search"
"document_type" => "parent"
"document_id" => "%{appId}"
Child: Works by default since parent/routing is same as parent document id. Routing formula (shard_num = hash(_routing) % num_primary_shards)
"index" => "search"
"document_type" => "child"
"document_id" => "%{lId}"
"parent" => "%{appId}"
Grandchild: Note Routing is appId which is grand parent document id
"index" => "search"
"document_type" => "grandchild"
"document_id" => "%{lBId}"
"parent" => "%{lId}"
"routing" => "%{appId}"
This will index all the documents to same shard and search works fine in this use case.

Elasticsearch aggregation turns results to lowercase

I've been playing with ElasticSearch a little and found an issue when doing aggregations.
I have two endpoints, /A and /B. In the first one I have parents for the second one. So, one or many objects in B must belong to one object in A. Therefore, objects in B have an attribute "parentId" with parent index generated by ElasticSearch.
I want to filter parents in A by children attributes of B. In order to do it, I first filter children in B by attributes and get its unique parent ids that I'll later use to get parents.
I send this request:
POST http://localhost:9200/test/B/_search
{
"query": {
"query_string": {
"default_field": "name",
"query": "derp2*"
}
},
"aggregations": {
"ids": {
"terms": {
"field": "parentId"
}
}
}
}
And get this response:
{
"took": 91,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "test",
"_type": "child",
"_id": "AU_fjH5u40Hx1Kh6rfQG",
"_score": 1,
"_source": {
"parentId": "AU_ffvwM40Hx1Kh6rfQA",
"name": "derp2child2"
}
},
{
"_index": "test",
"_type": "child",
"_id": "AU_fjD_U40Hx1Kh6rfQF",
"_score": 1,
"_source": {
"parentId": "AU_ffvwM40Hx1Kh6rfQA",
"name": "derp2child1"
}
},
{
"_index": "test",
"_type": "child",
"_id": "AU_fjKqf40Hx1Kh6rfQH",
"_score": 1,
"_source": {
"parentId": "AU_ffvwM40Hx1Kh6rfQA",
"name": "derp2child3"
}
}
]
},
"aggregations": {
"ids": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "au_ffvwm40hx1kh6rfqa",
"doc_count": 3
}
]
}
}
}
For some reason, the filtered key is returned in lowercase, hence not being able to request parent to ElasticSearch
GET http://localhost:9200/test/A/au_ffvwm40hx1kh6rfqa
Response:
{
"_index": "test",
"_type": "A",
"_id": "au_ffvwm40hx1kh6rfqa",
"found": false
}
Any ideas on why is this happening?
The difference between the hits and the results of the aggregations is that the aggregations work on the created terms. They will also return the terms. The hits return the original source.
How are these terms created? Based on the chosen analyser, which in your case is the default one, the standard analyser. One of the things this analyser does is lowercasing all the characters of the terms. Like mentioned by Andrei, you should configure the field parentId to be not_analyzed.
PUT test
{
"mappings": {
"B": {
"properties": {
"parentId": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
I am late from the party but I had the same issue and understood that it caused by the normalization.
You have to change the mapping of the index if you want to prevent any normalization changes the aggregated values to lowercase.
You can check the current mapping in the DevTools console by typing
GET /A/_mapping
GET /B/_mapping
When you see the structure of the index you have to see the setting of the parentId field.
If you don't want to change the behaviour of the field but you also want to avoid the normalization during the aggregation then you can add a sub-field to the parentId field.
For changing the mapping you have to delete the index and recreate it with the new mapping:
creating the index
Adding multi-fields to an existing field
In your case it looks like this (it contains only the parentId field)
PUT /B/_mapping
{
"properties": {
"parentId": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
then you have to use the subfield in the query:
POST http://localhost:9200/test/B/_search
{
"query": {
"query_string": {
"default_field": "name",
"query": "derp2*"
}
},
"aggregations": {
"ids": {
"terms": {
"field": "parentId.keyword",
"order": {"_key": "desc"}
}
}
}
}

Get specific fields from index in elasticsearch

I have an index in elastic-search.
Sample structure :
{
"Article": "Article7645674712",
"Genre": "Genre92231455",
"relationDesc": [
"Article",
"Genre"
],
"org": "user",
"dateCreated": {
"date": "08/05/2015",
"time": "16:22 IST"
},
"dateModified": "08/05/2015"
}
From this index i want to retrieve selected fields: org and dateModified.
I want result like this
{
"took": 265,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 28,
"max_score": 1,
"hits": [
{
"_index": "couchrecords",
"_type": "couchbaseDocument",
"_id": "3",
"_score": 1,
"_source": {
"doc": {
"org": "user",
"dateModified": "08/05/2015"
}
}
},
{
"_index": "couchrecords",
"_type": "couchbaseDocument",
"_id": "4",
"_score": 1,
"_source": {
"doc": {
"org": "user",
"dateModified": "10/05/2015"
}
}
}
]
}
}
How to query elastic-search to get only selected specific fields ?
You can retrieve only a specific set of fields in the result hits using the _source parameter like this:
curl -XGET localhost:9200/couchrecords/couchbaseDocument/_search?_source=org,dateModified
Or in this format:
curl -XPOST localhost:9200/couchrecords/couchbaseDocument/_search -d '{
"_source": ["doc.org", "doc.dateModified"], <---- you just need to add this
"query": {
"match_all":{} <----- or whatever query you have
}
}'
That's easy. Considering any query of this format :
{
"query": {
...
},
}
You'll just need to add the fields field into your query which in your case will result in the following :
{
"query": {
...
},
"fields" : ["org","dateModified"]
}
{
"_source" : ["org","dateModified"],
"query": {
...
}
}
Check ElasticSearch source filtering.

How to use _timestamp in a scripted update

I was trying to come up with an elegant answer to this question and ran into an unexpected problem. The basic idea is to update a document based on its current timestamp. Seems straightforward enough, but I seem to be missing something. At the bottom of the Update API page, the ES docs say:
It also allows to update the ttl of a document using ctx._ttl and timestamp using ctx._timestamp. Note that if the timestamp is not updated and not extracted from the _source it will be set to the update date.
The ES documentation is often enigmatic at best, especially when it comes to scripting, but I took this to mean that I could use the _timestamp field in an update script.
So I set up a simple index with a timestamp:
PUT /test_index
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"doc": {
"_timestamp": {
"enabled": true,
"store": true,
"path": "doc_date",
"format" : "YYYY-MM-dd"
},
"properties": {
"doc_date": {
"type": "date",
"format" : "YYYY-MM-dd"
},
"doc_text": {
"type": "string"
}
}
}
}
}
and added some docs:
POST /test_index/_bulk
{"index":{"_index":"test_index","_type":"doc","_id":1}}
{"doc_text":"doc1", "doc_date":"2015-2-5"}
{"index":{"_index":"test_index","_type":"doc","_id":2}}
{"doc_text":"doc2", "doc_date":"2015-2-10"}
{"index":{"_index":"test_index","_type":"doc","_id":3}}
{"doc_text":"doc3", "doc_date":"2015-2-15"}
If I query for the first doc, I get back what I expect:
POST /test_index/_search
{
"query": {
"match": {
"doc_text": "doc1"
}
},
"fields": [
"_timestamp",
"_source"
]
}
...
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1.4054651,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 1.4054651,
"_source": {
"doc_text": "doc1",
"doc_date": "2015-2-5"
},
"fields": {
"_timestamp": 1423094400000
}
}
]
}
}
So far so good. Now I want to conditionally update the first doc, based on its timestamp. First I tried this, and got an error:
POST /test_index/doc/1/_update
{
"script": "if(ctx._timestamp < new_ts){ctx._source.doc_date=new_date;ctx._source.doc_text=new_text}",
"params": {
"new_ts": 1423526400000,
"new_date": "2015-2-10",
"new_text": "doc1-updated"
}
}
...
{
"error": "ElasticsearchIllegalArgumentException[failed to execute script]; nested: PropertyAccessException[[Error: could not access: _timestamp; in class: java.util.HashMap]\n[Near : {... if(ctx._timestamp < new_ts){ctx._ ....}]\n ^\n[Line: 1, Column: 4]]; ",
"status": 400
}
Then I tried this:
POST /test_index/doc/1/_update
{
"script": "if(ctx[\"_timestamp\"] < new_ts){ctx._source.doc_date=new_date;ctx._source.doc_text=new_text}",
"params": {
"new_ts": 1423526400000,
"new_date": "2015-2-10",
"new_text": "doc1-updated"
}
}
...
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_version": 2
}
I didn't get an error, but the update didn't happen:
POST /test_index/_search
{
"query": {
"match": {
"doc_text": "doc1"
}
},
"fields": [
"_timestamp",
"_source"
]
}
...
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1.287682,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 1.287682,
"_source": {
"doc_text": "doc1",
"doc_date": "2015-2-5"
},
"fields": {
"_timestamp": 1423094400000
}
}
]
}
}
Just out of curiosity, I inverted the conditional:
POST /test_index/doc/1/_update
{
"script": "if(ctx[\"_timestamp\"] > new_ts){ctx._source.doc_date=new_date;ctx._source.doc_text=new_text}",
"params": {
"new_ts": 1423526400000,
"new_date": "2015-2-10",
"new_text": "doc1-updated"
}
}
with the same result: no update.
Okay, so as a sanity check I tried to set the timestamp, and got an error:
POST /test_index/doc/1/_update
{
"script": "ctx._source.doc_date=new_date;ctx._source.doc_text=new_text;ctx._timestamp=new_ts",
"params": {
"new_ts": 1423526400000,
"new_date": "2015-2-10",
"new_text": "doc1-updated"
}
}
...
{
"error": "ClassCastException[java.lang.Long cannot be cast to java.lang.String]",
"status": 500
}
I also tried it with "ctx[\"_timestamp\"]=new_ts;", and got the same error.
So it seems that the _timestamp field is not available to the script, even though the documentation says it is. What am I doing wrong?
I also tried updating without the conditional or updating the timestamp, and it worked as expected.
I used Elasticsearch version 1.3.4 (with dynamic scripting enabled, obviously), running on an Ubuntu 12 VM.
Here is the code I used to set this up:
http://sense.qbox.io/gist/ca2b3c6b84572e5f87d57d22f8c38252fa4ee216

Resources