Iterate over array update_by_query - elasticsearch

I have an array called 'tags' in the source of all my Elasticsearch docs in a particular index. I am trying to lowercase all values in the tag array using a update_by_query painless script.
This seems like a simple operation, here is what I have tried:
POST my_index/_update_by_query
{
"script": {
"source": """
for (int i = 0; i < ctx._source['tags'].length; ++i) {
ctx._source['tags'][i].value = ctx._source['tags'][i].value.toLowerCase()
}
""",
"lang": "painless"
},
"query": {
"match_all": {}
}
}
I am getting a null pointer exception when executing the above code. I think I may have the syntax slightly off. Having lots of trouble getting this to work, and would appreciate any help.

I fixed the issue...there were multiple small syntax errors but I needed to add an exists check:
POST my_index/_update_by_query
{
"script": {
"source": """
if (ctx._source.containsKey('tags')) {
for (int i = 0; i < ctx._source['tags'].length; ++i) {
ctx._source['tags'][i] = ctx._source['tags'][i].toLowerCase()
}
}
""",
"lang": "painless"
},
"query": {
"match_all": {}
}
}

Related

Calculate output in Elasticsearch Dev tools

Using Elasticsearch 7.*, we have a field 'ElapsedTime' under the mapping and I am trying to write a query to generate output of that field as 'ElapsedTime' / 1000.
Tried below but no luck:
1)
GET /_search
{
"script_fields": {
"test1": {
"script": {
"lang": "painless",
"source": "params._source.ElapsedTime / 1000"
}
}
}
}
GET /_search
{
"script_fields": {
"test2": {
"script": {
"lang": "expression",
"source": "doc['ElapsedTime'] / 1000"
}
}
}
}
Errors:
null pointer exception
parse_exception: Field [ElapsedTime] does not exist in mappings
You need to run GET concrete-index/_search on a concrete-index and not on / which runs on all indexes of your cluster, where the chance of hitting an index which doesn't have ElapsedTime in its mapping is quite big.

Elasticsearch: count query in _search script

I'm trying to make a single query for updating the one field value in ES index.
I have a index pages which contain information about the pages (id, name, time, parent_page_id, child_count etc)
I can update the field parent_page_id with number of documents which have this page id as parent_page_id
I can update the field with default single value like:
PUT HOST_ADDRESS/pages/_update_by_query
{
"script": {
"source": "def child_count = 0; ctx._source.child_count = child_count;",
"lang": "painless"
},
"query": {
"match_all": {}
}
}
I'm trying with this code to get child count but its not working.
"source": "def child_count = 0; client.prepareSearch('pages').setQuery(QueryBuilders.termQuery("parent_page_id", "ctx._source.id")).get().getTotal().getDocs().getCount(); ctx._source.child_count = child_count;",
"lang": "painless"
My question is, how can i make a sub count-query in script to have a real child count in variable child_count
Scripting doesn't work like this — you cannot use java DSL in there. There's no concept of client or QueryBuilders etc in the Painless contexts.
As such, you'll need to obtain the counts before you proceed to update the doc(s) with a script.
Tip: scripts are reusable when you store them:
POST HOST_ADDRESS/_scripts/update_child_count
{
"script": {
"lang": "painless",
"source": "ctx._source.child_count = params.child_count"
}
}
and then apply via the id:
PUT HOST_ADDRESS/pages/_update_by_query
{
"script": {
"id": "update_child_count", <-- no need to write the Painless code again
"params": {
"child_count": 987
}
},
"query": {
"term": {
"parent_page_id": 123
}
}
}

ElasticSearch Filter by sum of nested documents

I am trying to filter products where a sum of properties in the nested filtered objects is in some range.
I have the following mapping:
{
"product": {
"properties": {
"warehouses": {
"type": "nested",
"properties": {
"stock_level": {
"type": "integer"
}
}
}
}
}
}
Example data:
{
"id": 1,
"warehouses": [
{
"id": 2001,
"stock_level": 5
},
{
"id": 2002,
"stock_level": 0
},
{
"id": 2003,
"stock_level": 2
}
]
}
In ElasticSearch 5.6 I used to do this:
GET products/_search
{
"query": {
"bool": {
"filter": [
[
{
"script": {
"script": {
"source": """
int total = 0;
for (def warehouse: params['_source']['warehouses']) {
if (params.warehouse_ids == null || params.warehouse_ids.contains(warehouse.id)) {
total += warehouse.stock_level;
}
}
boolean gte = true;
boolean lte = true;
if (params.gte != null) {
gte = (total >= params.gte);
}
if (params.lte != null) {
lte = (total <= params.lte);
}
return (gte && lte);
""",
"lang": "painless",
"params": {
"gte": 4
}
}
}
}
]
]
}
}
}
The problem is that params['_source']['warehouses'] no longer works in ES 6.8, and I am unable to find a way to access nested documents in the script.
I have tried:
doc['warehouses'] - returns error (“No field found for [warehouses] in mapping with types []" )
ctx._source.warehouses - “Variable [ctx] is not defined.”
I have also tried to use scripted_field but it seems that scripted fields are getting calculated on the very last stage and are not available during query.
I also have a sorting by the same logic (sort products by the sum of stocks in the given warehouses), and it works like a charm:
"sort": {
"warehouses.stock_level": {
"order": "desc",
"mode": "sum",
"nested": {
"path": "warehouses"
"filter": {
"terms": {
"warehouses.id": [2001, 2003]
}
}
}
}
}
But I can't find a way to access this sort value either :(
Any ideas how can I achieve this? Thanks.
I recently had the same issue. It turns out the change occurred somewhere around 6.4 during refactoring and while accessing _source is strongly discouraged, it looks like people are still using / wanting to use it.
Here's a workaround taking advantage of the include_in_root parameter.
Adjust your mapping
PUT product
{
"mappings": {
"properties": {
"warehouses": {
"type": "nested",
"include_in_root": true, <--
"properties": {
"stock_level": {
"type": "integer"
}
}
}
}
}
}
Drop & reindex
Reconstruct the individual warehouse items in a for loop while accessing the flattened values:
GET product/_search
{
"query": {
"bool": {
"filter": [
{
"script": {
"script": {
"source": """
int total = 0;
def ids = doc['warehouses.id'];
def levels = doc['warehouses.stock_level'];
for (def i = 0; i < ids.length; i++) {
def warehouse = ['id':ids[i], 'stock_level':levels[i]];
if (params.warehouse_ids == null || params.warehouse_ids.contains(warehouse.id)) {
total += warehouse.stock_level;
}
}
boolean gte = true;
boolean lte = true;
if (params.gte != null) {
gte = (total >= params.gte);
}
if (params.lte != null) {
lte = (total <= params.lte);
}
return (gte && lte);
""",
"lang": "painless",
"params": {
"gte": 4
}
}
}
}
]
}
}
}
Be aware that this approach assumes that all warehouses include a non-null id and stock level.

Aggregations on nested documents with painless scripting

USING ELASTIC SEARCH 6.2
So I have a deeply nested document structure which has all the proper mapping (nested, text, keyword, etc). A sample document is as follows:
{
"type": "Certain Type",
"lineItems": [
{
"lineValue": 10,
"events": [
{
"name": "CREATED",
"timeStamp": "TIME VALUE"
},
{
"name": "ENDED",
"timeStamp": "TIME VALUE"
}
]
}
]
}
What I want to do is find out the average time required for all lines to go from CREATED to ENDED.
I created the following query
GET /_search
{
"size": 0,
"query": {
"match": {
"type": "Certain Type"
}
},
"aggs": {
"avg time": {
"nested": {
"path": "lineItems.events"
},
"aggs": {
"avg time": {
"avg": {
"script": {
"lang": "painless",
"source": """
long timeDiff = 0;
long fromTime = 0;
long toTime = 0;
if(doc['lineItems.events.name.keyword'] == "CREATED"){
fromTime = doc['lineItems.events.timeValue'].value.getMillis();
}
else if(doc['lineItems.events.name.keyword'] == "ENDED"){
toTime = doc['lineItems.events.timeValue'].value.getMillis();
}
timeDiff = toTime-fromTime;
return (timeDiff)
"""
}
}
}
}
}
}
}
The Result was that I got 0 as the aggregation result which is wrong.
Is there any way to achieve this?
Use doc[ in nested object script does not work as nested are a new document for elastic search.
Use params._source instead (https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-script-fields.html). Note access to source would be really slow, if you have a lot of documents or if you need to request this query a lot, consider add this field on main document.
I consider all value exist, add if robustness test if needed, this should work.
long toTime = 0;
long fromTime = 0;
timeDiff = params['_source']['ENDED']
fromTime = params['_source']['CREATED']
return (toTime - fromTime);

how to add a new field into a document using painless script

is there a way of creating a field in document within a painless script
if it does not exists?
i'm using something like:
if(!ctx._source.tags.contains(....)
but tags field may not be exists at document
can it be done?
thanks.
If you plan to use the _update_by_query API, I'd recommend you to do something like:
POST your_index/_update_by_query
{
"query": {
"bool": {
"must_not": {
"exists": {
"field": "tags"
}
}
}
},
"script": {
"source": "ctx._source.tags = ''"
}
}
Otherwise, just using painless, you can do something like:
{
"script": {
"source": """
if(ctx._source.tags == null) {
ctx._source.tags = null;
}
"""
}
}

Resources