elasticsearch script to check if field exists and create it - elasticsearch

I've created a script which records the history of the tags that are applied to my documents in elastic. The names of the tags are dynamic, so when I try to move the current tag to the history field, it fails for tags that do not already have a history field.
This is my script to copy the current tags, to the tag history field:
script:"ctx._source.tags[params.tagName.toString()].history.add(ctx._source.tags[params.tagName.toString()].current)"
This is what the documents look like:
"tags": {
"relevant": {
"current": {
"tagDate": 1501848372292,
"taggedByUser": "dev",
"tagActive": true
},
"history": [
{
"tagDate": 1501841137822,
"taggedByUser": "admin",
"tagActive": true
},
{
"tagDate": 1501841334127,
"taggedByUser": "admin",
"tagActive": true
},
}}}}
The users can add new tags dynamically, so what I want to do is create the history object if it does not exist and then I can populate it.
There is very little documentation available for the elasticsearch scripting, so I'm hoping someone wise will know the answer as I'm sure that checking for a field and creating it are fundamental things to the elastic scripting languages.
Update
So, having rethought the structure of this index, what I want to achieve is the following:
tags:[
{hot:
{current:{tagDate:1231231233, taggedbyUser: user1, tagStatus: true},
history:[ {tagDate:123444433, taggedbyUser: user1, tagStatus: true},
{tagDate:1234412433, taggedbyUser: user1, tagStatus: true}
]
}
{interesting:
{current:{tagDate:1231231233, taggedbyUser: user1, tagStatus: true},
history:[ {tagDate:123444433, taggedbyUser: user1, tagStatus: true},
{tagDate:1234412433, taggedbyUser: user1, tagStatus: true}
]
}
]
The tag names in this example are "hot" and "interesting", however the user will be able enter any tag name they want, so these are in no way predefined. When a user tags a document in elastic and the tag that is applied already exists in elastic, it should more the "current" tag to the "history" array and then overwrite the "current" tag with the new values.
Thank you for the responses to date, however the example code does not work for me.
The problem I think I'm having is that, first the code will need to loop through all of the tags and get the name. I then want to compare each of these to the name that I am supplying in the params. I think that this is where the first issue is arising.
I then need to move the "current" object to the "history" array. There also appears to be an issue here. I'm trying to use the "ctx._source.tags[i].history.add(params.param1), however nothing is added.
Any thoughts?
Thanks!

It's a bit more complicated because you need to do three things in the script:
if history does not already exist, initialize the array
move current tag to history
replace old current tag with the new one
Assuming that your initial document looks like this (note no history yet):
{
"_id": "AV2uvqCUfGXyNt1PjTbb",
"tags": {
"relevant": {
"current": {
"tagDate": 1501848372292,
"taggedByUser": "dev",
"tagActive": true
}
}
}
}
to be able to execute these three steps, you need to run following script:
curl -X POST \
http://127.0.0.1:9200/script/test/AV2uvqCUfGXyNt1PjTbb/_update \
-d '{
"script": {
"inline": "if (ctx._source.tags.get(param2).history == null) ctx._source.tags.get(param2).history = new ArrayList(); ctx._source.tags.get(param2).history.add(ctx._source.tags.get(param2).current); ctx._source.tags.get(param2).current = param1;",
"params" : {
"param1" : {
"tagDate": 1501848372292,
"taggedByUser": "my_user",
"tagActive": true
},
"param2": "relevant"
}
}
}'
And I get as a result:
{
"_id": "AV2uvqCUfGXyNt1PjTbb",
"_source": {
"tags": {
"relevant": {
"current": {
"tagActive": true,
"tagDate": 1501848372292,
"taggedByUser": "my_user"
},
"history": [
{
"tagDate": 1501848372292,
"taggedByUser": "dev",
"tagActive": true
}
]
}
}
}
}
Running the same script with a new content of parm1 (new tag) gives:
{
"_id": "AV2uvqCUfGXyNt1PjTbb",
"_source": {
"tags": {
"relevant": {
"current": {
"tagActive": true,
"tagDate": 1501841334127,
"taggedByUser": "admin"
},
"history": [
{
"tagDate": 1501848372292,
"taggedByUser": "dev",
"tagActive": true
},
{
"tagActive": true,
"tagDate": 1501848372292,
"taggedByUser": "my_user"
}
]
}
}
}
}
Update - if `tags` is a list
If tags is a list of "inner json objects", for example:
{
"tags": [
{
"relevant": {
"current": {
"tagDate": 1501841334127,
"taggedByUser": "dev",
"tagActive": true
}
}
},
{
"new_tag": {
"current": {
"tagDate": 1501848372292,
"taggedByUser": "admin",
"tagActive": true
}
}
}
]
}
you have to iterate over the list to find the index of the right element. Let's say you want to update element new_tag. First, you need to check if this tag exists - if so, get its index, if not, return from the script. Having the index, just get the right element and you can go almost the same as before. The script looks like this:
int num = -1;
for (int i = 0; i < ctx._source.tags.size(); i++) {
if (ctx._source.tags.get(i).get(param2) != null) {
num = i;
break;
};
};
if (num == -1) {
return;
};
if (ctx._source.tags.get(num).get(param2).history == null)
ctx._source.tags.get(num).get(param2).history = new ArrayList();
ctx._source.tags.get(num).get(param2).history.add(ctx._source.tags.get(num).get(param2).current);
ctx._source.tags.get(num).get(param2).current = param1;
And the wole query:
curl -X POST \
http://127.0.0.1:9200/script/test/AV29gAnpqbJMKVv3ij7U/_update \
-d '{
"script": {
"inline": "int num = -1; for (int i = 0; i < ctx._source.tags.size(); i++) {if (ctx._source.tags.get(i).get(param2) != null) {num = i; break;};}; if (num == -1) {return;}; if (ctx._source.tags.get(num).get(param2).history == null) ctx._source.tags.get(num).get(param2).history = new ArrayList(); ctx._source.tags.get(num).get(param2).history.add(ctx._source.tags.get(num).get(param2).current); ctx._source.tags.get(num).get(param2).current = param1;",
"params" : {
"param1" : {
"tagDate": 1501848372292,
"taggedByUser": "my_user",
"tagActive": true
},
"param2": "new_tag"
}
}
}
'
Result:
{
"tags": [
{
"relevant": {
"current": {
"tagDate": 1501841334127,
"taggedByUser": "dev",
"tagActive": true
}
}
},
{
"new_tag": {
"current": {
"tagActive": true,
"tagDate": 1501848372292,
"taggedByUser": "my_user"
},
"history": [
{
"tagDate": 1501848372292,
"taggedByUser": "admin",
"tagActive": true
}
]
}
}
]
}

I think you can do something like this in groovy scripting
{
"script": "if( ctx._source.containsKey(\"field_name\") ){ ctx.op = \"none\"} else{ctx._source.field_name= field_value;}"
}

Related

How can I change a json structure into an object and rename keys with jq

Using jq I am trying to convert the rawe json below into the desired json outcome.
Objectives:
name renamed to pathParameterName
type renamed to datasetParameter
Raw Json I'm trying to convert
{
"pathOptions": {
"parameters": {
"raw_date": {
"name": "raw_date",
"type": "Datetime",
"datetimeOptions": {
"localeCode": "en-GB"
},
"createColumn": true,
"filter": {
"expression": "(after :date1)",
"valuesMap": {
":date1": "2022-03-08T00:00:00.000Z"
}
}
}
}
}
}
Json desired outcome:
{
"pathOptions": {
"parameters": [
{
"pathParameterName": "raw_date",
"datasetParameter": {
"name": "raw_date",
"type": "Datetime",
"datetimeOptions": {
"localeCode": "en-GB"
},
"createColumn": true,
"filter": {
"expression": "(after :date1)",
"valuesMap": [
{
"valueReference": ":date1",
"value": "2022-03-08T00:00:00.000Z"
}
]
}
}
}
]
}
}
This is what I have so far:
map_values(if type == "object" then to_entries else . end)
This is what my code above currently produces. -I'm struggling with the key renaming.
{
"pathOptions": [
{
"key": "parameters",
"value": [
{
"pathParameterName": "raw_date",
"datasetParameter": {
"name": "raw_date",
"type": "Datetime",
"datetimeOptions": {
"localeCode": "en-GB"
},
"createColumn": true,
"filter": {
"expression": "(after :date1)",
"valuesMap": [
{
"valueReference": ":date1",
"value": "2022-03-08T00:00:00.000Z"
}
]
}
}
}
]
}
]
}
The function to_entries, "converts between an object and an array of key-value pairs" (see the manual). To rename the preset key and value fields, just reassign them to a new name with a new object as in {valueReference: .key, value}.
jq '
.pathOptions.parameters |= (
to_entries | map({
pathParameterName: .key,
datasetParameter: (
.value | .filter.valuesMap |= (
to_entries | map({valueReference: .key, value})
)
)
})
)
'
{
"pathOptions": {
"parameters": [
{
"pathParameterName": "raw_date",
"datasetParameter": {
"name": "raw_date",
"type": "Datetime",
"datetimeOptions": {
"localeCode": "en-GB"
},
"createColumn": true,
"filter": {
"expression": "(after :date1)",
"valuesMap": [
{
"valueReference": ":date1",
"value": "2022-03-08T00:00:00.000Z"
}
]
}
}
}
]
}
}
Demo

Querying complex nested object in cosmosdb using sql Api

How query only those users whose Itemcount > 10 from the complex nested object(with dynamic key) from comosdb using sql api? UDF not preferred.
Something like,
Select c.username from c where c.Data[*].Order.ItemCount > 10;
{
{
"Username": "User1",
"Data": {
"RandomGUID123": {
"Order": {
"Item": "ItemName123",
"ItemCount" : "40"
},
"ShipmentNumber": "7657575"
},
"RandomGUID976": {
"Order": {
"Item": "ItemName7686"
"ItemCount" : "7"
},
"ShipmentNumber": "876876"
}
}
},
{
"Username": "User2",
"Data": {
"RandomGUID654": {
"Order": {
"Item": "ItemName654",
"ItemCount" : "9"
},
"ShipmentNumber": "7612575"
},
"RandomGUID908": {
"Order": {
"Item": "ItemName545"
"ItemCount" : "6"
},
"ShipmentNumber": "6454"
}
}
}
}
I'm not sure about how to handle unknown keys, but if you're willing to model the key as a value instead (simpler and cleaner I'd argue), you could have:
{
"Username": "User1",
"Data": [
{
"Id": "RandomGUID123",
"Order": {
"Item": "ItemName123",
"ItemCount": 40
},
"ShipmentNumber": "7657575"
},
{
"Id": "RandomGUID976",
"Order": {
"Item": "ItemName7686",
"ItemCount": 7
},
"ShipmentNumber": "876876"
}
]
}
With a query like:
SELECT DISTINCT VALUE(c.Username)
FROM c
JOIN (SELECT VALUE d from d IN c.Data where d["Order"].ItemCount > 10)
Result:
[
"User1"
]
"Order" is a reserved keyword and requires the bracket syntax to reference.
As Noah answers,model the key as a value is a way to achieve.
Additionally,there is another way to achieve without changing your schema of your document .Create UDF like this:
function getResult(data){
for(var key in data){
const itemCount = data[key].Order.ItemCount;
if (parseFloat(itemCount).toString() != "NaN" && parseFloat(itemCount) > 10 ) {
    return true;
  }
}
return false;
}
Then run this sql:
SELECT c.Username FROM c where udf.getResult(c.Data)
Result:
[
{
"Username": "User1"
}
]

Elasticsearch: Update/upsert an array field inside a document but ignore certain existing fields

GET _doc/1
"_source": {
"documents": [
{
"docid": "ID001",
"added_vals": [
{
"code": "123",
"label": "Abc"
},
{
"code": "113",
"label": "Xyz"
}
]
},
{
"docid": "ID002",
"added_vals": [
{
"code": "123",
"label": "Abc"
}
]
}
],
"id": "1"
}
POST /_bulk
{ "update": { "_id": "1"}}
{ "doc": { "documents": [ { "docid": "ID001", "status" : "cancelled" } ], "id": "1" }, "doc_as_upsert": true }
The problem above is when I run my bulk update script it replaces that document field, removing the added_vals list. Would I be able to achieve this using painless script? Thank you.
Using elasticsearch painless scripting
POST /_bulk
{ "update": { "_id": "1"} }
{ "scripted_upsert":true, "script" :{ "source": "if(ctx._version == null) { ctx._source = params; } else { def param = params; def src = ctx._source; for(s in src.documents) { boolean found = false; for(p in param.documents) { if (p.docid == s.docid) { found = true; if(s.added_vals != null) { p.added_vals = s.added_vals; } } } if(!found) param.documents.add(s); } ctx._source = param; }", "lang": "painless", "params" : { "documents": [ { "docid": "ID001", "status" : "cancelled" } ], "id": "1" } }, "upsert" : { } }
well, this one worked for me. I need to tweak a few more things that I require, but I will just leave it here for someone who may need it. Didnt know it was this simple. If there is any other answer that might be easier, please do submit so. Thanks.
"script" :
if(ctx._version == null)
{
ctx._source = params;
}
else
{
def param = params;
def src = ctx._source;
for(s in src.documents)
{
boolean found = false;
for(p in param.documents)
{
if (p.docid == s.docid)
{
found = true;
if(s.added_vals != null)
{
p.added_vals = s.added_vals;
}
}
}
if(!found) param.documents.add(s);
}
ctx._source = param;
}
I am not sure if I should modify the params directly so I used and pass the params to the param variable. I also used scripted_upsert: true with a ctx._version not null check.

how to sort Data Sources in terraform based on arguments

I use following terraform code to get a list of available db resources:
data "alicloud_db_instance_classes" "resources" {
instance_charge_type = "PostPaid"
engine = "PostgreSQL"
engine_version = "10.0"
category = "HighAvailability"
zone_id = "${data.alicloud_zones.rds_zones.ids.0}"
multi_zone = true
output_file = "./classes.txt"
}
And the output file looks like this:
[
{
"instance_class": "pg.x4.large.2",
"storage_range": {
"max": "500",
"min": "250",
"step": "250"
},
"zone_ids": [
{
"id": "cn-shanghai-MAZ1(b,c)",
"sub_zone_ids": [
"cn-shanghai-b",
"cn-shanghai-c"
]
}
]
},
{
"instance_class": "pg.x8.medium.2",
"storage_range": {
"max": "250",
"min": "250",
"step": "0"
},
"zone_ids": [
{
"id": "cn-shanghai-MAZ1(b,c)",
"sub_zone_ids": [
"cn-shanghai-b",
"cn-shanghai-c"
]
}
]
},
{
"instance_class": "rds.pg.c1.xlarge",
"storage_range": {
"max": "2000",
"min": "5",
"step": "5"
},
"zone_ids": [
{
"id": "cn-shanghai-MAZ1(b,c)",
"sub_zone_ids": [
"cn-shanghai-b",
"cn-shanghai-c"
]
}
]
},
{
"instance_class": "rds.pg.s1.small",
"storage_range": {
"max": "2000",
"min": "5",
"step": "5"
},
"zone_ids": [
{
"id": "cn-shanghai-MAZ1(b,c)",
"sub_zone_ids": [
"cn-shanghai-b",
"cn-shanghai-c"
]
}
]
}
]
And I want to get the one that's cheapest.
One way to do so is by sorting with storage-range.min, but how do I sort this list based on 'storage_range.min'?
Or I can filter by 'instance_class', but "alicloud_db_instance_classes" doesn't seem to like filter as it says: Error: data.alicloud_db_instance_classes.resources: : invalid or unknown key: filter
Any ideas?
The sort() function orders lexicographical and you have no simple key here.
You can use filtering with some code like this (v0.12)
locals {
best_db_instance_class_key = "rds.pg.s1.small"
best_db_instance_class = element( alicloud_db_instance_classes.resources, index(alicloud_db_instance_classes.resources.*.instance_class, best_db_instance_class_key) )
}
(Untested code)

Elasticsearch partial update of Object(multi=True)

How to update document with field mapping Object(multi=True),
when a document can have both single (dictionary) and multiple values (list of dictionaries).
Example of documents in the same index:
A single value in items:
{
"title": "Some title",
"items": {
"id": 123,
"key": "foo"
}
}
Multiple values in items:
{
"title": "Some title",
"items": [{
"id": 456,
"key": "foo"
}, {
"id": 789,
"key": "bar"
}]
}
You can try to use the following script.
I intentionally formatted inline attribute to show what's inside.
POST index_name/_update_by_query
{
"search": {
"term": {
"items.key": "foo"
}
},
"script": {
"inline": "
if (ctx._source.items instanceof List) {
for (item in ctx.source.items) {
if (item.key == params.old_value) {
item.key = params.new_value;
break;
}
}
} else {
ctx._source.items.key = params.new_value;
}
",
"params": {"old_value": "foo", "new_value": "bar"},
"lang": "painless'
}
}
And to make it work, replace inline attribute with a single line value.
"inline": "if (ctx._source.items instanceof List) {for (item in ctx.source.items) {if (item.key == params.old_value) {item.key = params.new_value;break;}}} else {ctx._source.items.key = params.new_value;}"

Resources