How to add 2 values in elasticsearch script? - elasticsearch

I am trying to create a rows_processed field by adding 2 fields src_s_rows and tgt_s_rows, but some how it is not working, it always gives me 0. Even when I give "script": "(doc['src_s_rows'].value)" instead of "script": "(doc['src_s_rows'].value+doc['tgt_s_rows'].value)" it still gives me 0.
What is it that I am missing, please help.
GET run_hist/task_hist/_search
{
"fields": [
"THROUGHPUT_ROWS_PER_SEC",
"start_time",
"end_time",
"src_s_rows",
"tgt_s_rows"
],
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"_id": "249885850"
}
}
]
}
}
}
},
"filter": {
"script": {
"script": "(doc['end_time'].value-doc['start_time'].value)>minutes*1",
"params": {
"minutes": 60000
}
}
},
"script_fields": {
"total_time_taken": {
"script": "(doc['end_time'].value-doc['start_time'].value)/1000"
},
"rows_processed": {
"script": "(doc['src_s_rows'].value+doc['tgt_s_rows'].value)"
}
},
"size": 10000
}
Screenshot given below

Use _source.src_s_rows.value in place of doc['src_s_rows'].value
try this
"script": "(_source.src_s_rows.value+_source.tgt_s_rows.value)"

Related

How do I get the size of a 'nested' type array through a Painless script in Elasticsearch version 6.7?

I am using Elasticsearch version 6.7. I have the following mapping:
{
"customers": {
"mappings": {
"customer": {
"properties": {
"name": {
"type": "keyword"
},
"permissions": {
"type": "nested",
"properties": {
"entityId": {
"type": "keyword"
},
"entityType": {
"type": "keyword"
},
"permission": {
"type": "keyword"
},
"permissionLevel": {
"type": "keyword"
},
"userId": {
"type": "keyword"
}
}
}
}
}
}
}
}
I want to run a query to that shows all customers who have > 0 permissions. I have tried the following:
{
"query": {
"bool": {
"filter": {
"script": {
"script": {
"lang": "painless",
"source": "params._source != null && params._source.permissions != null && params._source.permissions.size() > 0"
}
}
}
}
}
}
But this returns no hits because params._source is null as Painless does not have access to the _source document according to this Stackoverflow post. How can I write a Painless script that gives me all customers who have > 0 permissions?
Solution 1: Using Script with must query
POST <your_index_name>/_search
{
"query": {
"bool": {
"must": [
{
"script": {
"script": {
"lang": "painless",
"inline": """
ArrayList st = params._source.permissions;
if(st!=null && st.size()>0)
return true;
"""
}
}
}
]
}
}
}
Solution 2: Using Exists Query on nested fields
You could simply make use of Exists query something like the below to get customers who have > 0 permissions.
Query:
POST <your_index_name>/_search
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "permissions",
"query": {
"bool": {
"should": [
{
"exists":{
"field": "permissions.permission"
}
},
{
"exists":{
"field": "permissions.entityId"
}
},
{
"exists":{
"field": "permissions.entityType"
}
},
{
"exists":{
"field": "permissions.permissionLevel"
}
}
]
}
}
}
}]
}
}
}
Solution 3: Create definitive structure but add empty values to the fields
Another alternative would be to ensure all documents would have the fields.
Basically,
Ensure that all the documents would have the permissions nested document
However for those who would not have the permissions, just set the field permissions.permission to 0
Construct a query that could help you get such documents accordingly
Below would be a sample document for a user who doesn't have permissions:
POST mycustomers/customer/1
{
"name": "john doe",
"permissions": [
{
"entityId" : "null",
"entityType": "null",
"permissionLevel": 0,
"permission": 0
}
]
}
The query in that case would be as simple as this:
POST <your_index_name>/_search
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "permissions",
"query": {
"range": {
"permissions.permission": {
"gte": 1
}
}
}
}
}
]
}
}
}
Hope this helps!

Filter by length of nested array

Here is my mapping:
{"field_name": {
"dynamic": "strict",
"properties": {...},
"type": "nested"
}}
And I am trying to filter only documents which have at least one field_name.
I tried:
{"query": {
"bool": {
"filter": [ { "script" : {
"script" : {
"inline": "doc['field_name'].length >= 1",
"lang": "painless"
} } ]
}
} }
But elasticsearch is screaming at me about No field found for [field_name] in mapping with types [type_name].
I also tried to wrap the previous query into a nested but didn't work either:
{ "nested": {
"path": "field_name",
"query": {
"bool": {
"filter": [ {
"script": {
"script": {
"inline": "doc['field_name'].length >= 1",
"lang": "painless"
}
}
} ]
}
}
} }
This gave the same error as above.
Any ideas?
if all object has same field , you can use exist to check if object exist, then use sum to calc count,then use script score to choose the condition you want. like below code
{
"query": {
"function_score": {
"query": {
"nested": {
"path": "field_name",
"query": {
"exists": {
"field": "field_name.same_field"
}
},
"score_mode": "sum"
}
},
"functions": [
{
"script_score": {
"script": {
"source": "_score >= 1 ? 1 : 0"
}
}
}
],
"boost_mode": "replace"
}
},
"min_score": 1
}
What I ended up doing is adding a field my_array_length during construction time. Like that I can just filter by the value of this field.
Simple approach would be using exists term for each of the fields:
{
"query": {
"filtered": {
"filter": {
"bool": {
"should": [
{
"exists": {
"field": "field_name.dynamic"
}
},
{
"exists": {
"field": "field_name.properties"
}
},
{
"exists": {
"field": "field_name.type"
}
}
],
"minimum_should_match": 1
}
}
}
}
}
You define should clause with minimum_should_match and get only relevant documents.
See exists, bool-query

"update by query" not working as expected with straight calls

I've an script that calls Elasticsearch with some update_by_query.
Here I update the item with id=299966 and change the trash flag, trash=0:
_update_by_query
{
"query": {
"query": {
"bool": {
"must": [
{
"terms": {
"_id": [
299966
]
}
}
],
"should": [
]
}
}
},
"script": {
"inline": "ctx._source.trash=0"
}
}
Then I the item with id=299966 (same item as above) to trash=1:
_update_by_query
{
"query": {
"query": {
"bool": {
"must": [
{
"terms": {
"_id": [
299966
]
}
}
],
"should": [
]
}
}
},
"script": {
"inline": "ctx._source.trash=1"
}
}
The thing is that after doing this two operations, if I search for the item with id=299966, I get trash=0, when it's supposed to be trash=1 as it's the last one executed. I always mantain the order and my own log shows that the one with trash=0 is first executed, and then the one with trash=1.
Is there any stuff inside the update_by_query logic that avoids to make two calls? Do I have to wait some seconds or something to make the second update_by_query?
PS: Nervemind those double query on the codes. It's working ok.
Thanks in advance.
The solution I found is to use _flush after every _update or every _update_by_query.
myindex/_update_by_query
{
"query": {
"query": {
"bool": {
"must": [
{
"terms": {
"_id": [
299966
]
}
}
],
"should": [
]
}
}
},
"script": {
"inline": "ctx._source.trash=0"
}
}
myindex/_flush
myindex/_update_by_query
{
"query": {
"query": {
"bool": {
"must": [
{
"terms": {
"_id": [
299966
]
}
}
],
"should": [
]
}
}
},
"script": {
"inline": "ctx._source.trash=1"
}
}

Elasticsearch filtered query with script for term frequency

I'm using the attachment plugin: https://github.com/elastic/elasticsearch-mapper-attachments
I'm able to find documents with a specific word in 1 or more fields but unable to filter documents with a lower term frequency than searched for.
This works:
POST /crm/employee/_search
{
"query": {"filtered": {
"query": {"match": {
"employee.cv.content": "transitie"
}},
"filter": {
"bool": {
"should": [
{"terms": {
"employee.listEmployeeType.id": [
2
]
}}
]
}
}
}},
"highlight": {"fields": {"employee.cv.content" : {}}}
}
After a long search, I've found the following:
"script": {
"script": "crm['employee.cv.content'][lookup].tf() > occurrence",
"params": {
"lookup": "transitie",
"occurrence": 1
}
},
I'm unable to implement it unfortunately. I hope i've explained the issue good enough for someone to give me a push in the right direction!
{
"query": {
"filtered": {
"query": {
"match": {
"employee.cv.content": "transitie"
}
},
"filter": {
"bool": {
"should": [
{
"terms": {
"employee.listEmployeeType.id": [
2
]
}
}
],
"must": [
{
"script": {
"script": "_index['employee.cv.content'][lookup].tf() > occurrence",
"params": {
"lookup": "transitie",
"occurrence": 1
}
}
}
]
}
}
}
},
"highlight": {
"fields": {
"employee.cv.content": {}
}
}
}

Debugging ElasticSearch boolean queries

I'm faced with an issue with an automatically-generated elasticSearch query. When running it, six of the seven shards I'm using on the index return a success, and the seventh returns this error:
index: "shard"
reason: "ClassCastException[
org.elasticsearch.common.mvel2.compiler.BlankLiteral cannot be cast to java.lang.Boolean]"
shard: 1
status: 500
successful: 6
total: 7
How can I figure out what this is coming from, considering that the explain endpoint yields absolutely nothing due to the root query being a bool?
The query is as follows:
{
"query": {
"bool": {
"must": [
{
"filtered": {
"query": {
"range": {
"date": {
"lte": "2014-05-21T21:59:59+00:00",
"gte": "2013-01-23T23:00:00+00:00"
}
}
},
"filter": {
"not": {
"terms": {
"idCountry": [
"9999"
]
}
}
}
}
},
{
"filtered": {
"query": {
"nested": {
"path": "reports",
"query": {
"terms": {
"reports.36317.flag": [
"o"
],
"minimum_should_match": 1
}
}
}
},
"filter": {
"nested": {
"path": "reports",
"filter": {
"exists": {
"field": "reports.36317"
}
}
}
}
}
}
]
}
},
"script_fields": {
"idTone": {
"script": "_source.reports[reportId].empty || _source.reports[reportId].idTone.empty ? _source.idLanguage : _source.reports[reportId].idTone",
"params": {
"reportId": "36317"
}
},
"tags": {
"script": "_source.reports[reportId].empty || _source.reports[reportId].tags.empty ? 'none' : _source.reports[reportId].tags",
"params": {
"reportId": "36317"
}
},
"flag": {
"script": "_source.reports[reportId].empty || _source.reports[reportId].flag.empty ? 'O' : _source.reports[reportId].flag",
"params": {
"reportId": "36317"
}
},
"synthesioRank": {
"script": "_source.reports[reportId].empty || _source.reports[reportId].synthesioRank.empty || _source.reports[reportId].synthesioRank == null ? '0' : _source.reports[reportId].synthesioRank",
"params": {
"reportId": "36317"
}
},
"idUserEngagement": {
"script": "_source.reports[reportId].empty || _source.reports[reportId].idUserEngagement == null ? '0' : _source.reports[reportId].idUserEngagement",
"params": {
"reportId": "36317"
}
},
"idStatus": {
"script": "_source.reports[reportId].empty || _source.reports[reportId].idStatus == null ? '0' : _source.reports[reportId].idStatus",
"params": {
"reportId": "36317"
}
}
},
"fields": [
"access",
"content",
"title",
"date",
"geo",
"idItem",
"idSiteType",
"idSite",
"idSource",
"idSourceType",
"idTopic",
"media",
"url",
"idLanguage",
"idDocument",
"idCountry"
]
}
The thrown exception has to do with mvel, the scripting language used for the script_fields in your case.
The fact that only 1 shard fails maybe means that the execution of one of your scripted fields fails against one specific document in that shard.
You could try and remove those fields, one by one, along with any filter requesting that field, to spot the one that fails.
Note : the Explain API is designed to help understand scoring computation inside a sorted query. It won't help you in any way regarding a failing query.

Resources