Get result from aggs in script ElasticSearch/Painless - elasticsearch

I'm new in ElasticSearch world. I've been trying write simple request and I need to get aggs result in my script to make simple condition. Is it possible to do it in this way?
The condition below is only for example.
GET _search
{
"aggs" : {
"sum_field" : { "sum" : { "field" : "someField" } }
},
"script_fields": {
"script_name": {
"script": {
"lang": "painless",
"source": """
// get there aggs result (sum_field)
if(sum_field > 5){
return sum_field
}
"""
}
}
}
}

The requirement is to execute sum aggregation over multiple indexes having the same field name
Now with multiple indexes, you'll have to check if that particular field exists in that indexes or not AND if the field is of the same datatype.
Indexes
I've created three indexes, having a single field called num.
index_1
- num: long
index_2
- num: long
index_3
- num: text
: fielddata: true
Also notice how if the field is of type text, then I've set its property fielddata:true. But if you do not set it, then the below query would give you aggregation result as well as an error saying you cannot retrieve the value of type text as its an analyzed string and you can only use doc for fields which are non_analyzed.
Sample Query:
POST /_search
{
"size":0,
"query":{
"bool":{
"filter":[
{
"exists":{
"field":"num"
}
}
]
}
},
"aggs":{
"myaggs":{
"sum":{
"script":{
"source":"if(doc['num'].value instanceof long) return doc['num'].value;"
}
}
}
}
}
Query if you cannot set fielddata:true
In that case, you need to explicitly mention the indexes on which you'd want to aggregate.
POST /_search
{
"size":0,
"query":{
"bool":{
"filter":[
{
"exists":{
"field":"num"
}
},
{
"terms":{
"_index":[
"index_1",
"index_2"
]
}
}
]
}
},
"aggs":{
"myaggs":{
"sum":{
"script":{
"source":"if(doc['num'].value instanceof long) return doc['num'].value;"
}
}
}
}
}
Hope this helps!

Related

Elasticsearch: Query to filter out specific documents based on field value and return count

I'm trying to compose a query in Elasticsearch that filters out documents with a specific field value, and also returns the number of documents that has been filtered out as an aggregation.
What I have so far is below, however, with my solution it seems that the documents are filtered out first, then after the filtering, the count is performed, which is making it always be 0.
{
"query":{
"bool":{
"must_not":[
{
"terms":{
"gender":[
"male"
]
}
}
]
}
},
"aggs":{
"removed_docs_count":{
"filter":{
"term":{
"gender":"male"
}
}
}
}
}
You don't need a query block, just aggs will provide you expected results.
{
"aggs":{
"removed_docs_count":{
"filter":{
"term":{
"gender":"male"
}
}
}
}
}

Is it possible to sort by a range in Elasticsearch?

When I execute the following query:
{
"query": {
"bool": {
"filter": [
{
"match": {
"my_value": "hi"
}
},
{
"range": {
"my_range": {
"gt": 0,
"lte": 200
}
}
}
]
}
},
"sort": {
"my_range": {
"order": "asc",
"mode": "min"
}
}
}
I get the error:
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Fielddata is not supported on field [my_range] of type [long_range]"
}
How can I enable a range datatype to be sortable? Is this possible?
Elasticsearch version: 5.4, but I am wondering if this is possible with ANY version.
More context
Not all documents in the alias/index have the range field. However, the query filters to only include documents with that field.
It is not straight-forward to sort using a field of range data type. Still you can use script based sorting to some extent to get the expected result.
e.g. For simplicity of script I'm assuming for all your docs, the data indexed against my_range field has data for gt and lte only and you want to sort based on the minimum values of the two then you can add the below for sorting:
{
"query": {
"bool": {
"filter": [
{
"match": {
"my_value": "hi"
}
},
{
"range": {
"my_range": {
"gt": 0,
"lte": 200
}
}
}
]
}
},
"sort": {
"_script": {
"type": "number",
"script": {
"lang": "painless",
"inline": "Math.min(params['_source']['my_range']['gt'], params['_source']['my_range']['lte'])"
},
"order": "asc"
}
}
}
You can modify the script as per your needs for complex data involving combination of all lt, gt, lte, gte.
Updates (Scripts for other different use cases):
1. Sort by difference
"Math.abs(params['_source']['my_range']['gt'] - params['_source']['my_range']['lte'])"
2. Sort by gt
"params['_source']['my_range']['gt']"
3. Sort by lte
"params['_source']['my_range']['lte']"
4. Sorting if query returns few docs which don't have range field
"if(params['_source']['my_range'] != null) { <sorting logic> } else { return 0; }"
Replace <sorting logic> with the required logic of sorting (which can be one of the 3 above or the one in the query)
return 0 can be replace by return -1 or anything other number as per the sorting needs
I think what you are looking for is sort based on the difference of the range coz I'm not sure if simply sorting on any of the range values would make any sense.
For e.g. if range for one document is 100, 300 and another 200, 600 then you would want to sort based on the difference for e.g. you would want the lesser range to be appearing i.e 300-100 = 200 to be appearing at the top.
If so, I've made use of the below painless script and implemented script based sorting.
Sorting based on difference in Range
POST <your_index_name>/_search
{
"query":{
"match_all":{
}
},
"sort":{
"_script":{
"type":"number",
"script":{
"lang":"painless",
"inline":"params._source.my_range.lte-params._source.my_range.gte"
},
"order":"asc"
}
}
}
Note that in this case, sort won't be based on any of the field values of my_range but only on their differences. If you want to further sort based on the fields like lte, lt, gte or gt you can have your sort implemented with multiple script as below:
Sorting based on difference in Range + Range Field (my_range.lte)
POST <your_index_name>/_search
{
"query":{
"match_all":{
}
},
"sort":[
{
"_script":{
"type":"number",
"script":{
"lang":"painless",
"inline":"params._source.my_range.lte - params._source.my_range.gte"
},
"order":"asc"
}
},
{
"_script":{
"type":"number",
"script":{
"lang":"painless",
"inline":"params._source.my_range.lte"
},
"order":"asc"
}
}
]
}
So in this case even if for two documents, ranges are same, the one with the lesser my_range.lte would be showing up first.
Sort based on range field
However if you simply want to sort based on one of the range values, you can make use of below query.
POST <your_index_name>/_search
{
"query":{
"match_all":{
}
},
"sort":{
"_script":{
"type":"number",
"script":{
"lang":"painless",
"inline":"params._source.my_range.lte"
},
"order":"asc"
}
}
}
Updated Answer to manage documents without range
This is for the scenario, Sort based on difference in range + Range.lte or Range.lt whichever is present
The below code what it does is,
Checks if the document has my_range field
If it doesn't have, then by default it would return Long.MAX_VALUE. This would mean if you sort by asc, this document should returned
last.
Further it would check if document has lte or lt and uses that value as high. Note that default value of high is Long.MAX_VALUE.
Similarly it would check if document has gte or gt and uses that value as low. Default value of low would be 0.
Calculate now high - low value on which sorting would be applied.
Updated Query
POST <your_index_name>/_search
{
"size":100,
"query":{
"match_all":{
}
},
"sort":[
{
"_script":{
"type":"number",
"script":{
"lang":"painless",
"inline":"""
if(params._source.my_range==null){
return Long.MAX_VALUE;
} else {
long high = Long.MAX_VALUE;
long low = 0L;
if(params._source.my_range.lte!=null){
high = params._source.my_range.lte;
} else if(params._source.my_range.lt!=null){
high = params._source.my_range.lt;
}
if(params._source.my_range.gte!=null){
low = params._source.my_range.gte;
} else if (params._source.my_range.gt==null){
low = params._source.my_range.gt;
}
return high - low;
}
"""
},
"order":"asc"
}
},
{
"_script":{
"type":"number",
"script":{
"lang":"painless",
"inline":"""
if(params._source.my_range==null){
return Long.MAX_VALUE;
}
long high = Long.MAX_VALUE;
if(params._source.my_range.lte!=null){
high = params._source.my_range.lte;
} else if(params._source.my_range.lt!=null){
high = params._source.my_range.lt;
}
return high;"""
},
"order":"asc"
}
}
]
}
This should work with ES 5.4. Hope it helps!
This can be resolved easily by using the regex interval filter :
Interval The interval option enables the use of numeric ranges,
enclosed by angle brackets "<>". For string: "foo80":
foo<1-100> # match
foo<01-100> # match
foo<001-100> # no match
Enabled with the INTERVAL or ALL flags.
Elactic docs
{
"query": {
"bool": {
"filter": [
{
"match": {
"my_value": "hi"
}
},
{
"regexp": {
"my_range": {
"value": "<0-200>"
}
}
}
]
}
},
"sort": {
"my_range": {
"order": "asc",
"mode": "min"
}
}
}

Matching multiple values in same field

I have "routes" field as long type (Im storing array of values in that Example 1. [5463, 3452] , 2. [5467, 3452]) in mapping. In the following query i
want to retrieve data which matches both 5463, 3452 in same record
GET /flight_routes/_search
{
"query": {
"bool": {
"filter": {
"terms": {
"routes": [5463, 3452]
}
}
}
}
}
But it is returning document which matches with either one value. Should I have to migrate the mapping type to nested to handle this or
any other way to get it through query itself?
You can use the terms_set query with a minimum_should_match_script that returns the length of the array
POST /flight_routes/_search
{
"query": {
"terms_set": {
"routes" : {
"terms" : [5463, 3452],
"minimum_should_match_script": {
"source": "params.nb_terms",
"params": {
"nb_terms": 2
}
}
}
}
}
}

Elasticsearch Prefix Exact Match

i have text fields like above
elastic|b|c
elastic,search|b|c
elastic,search,prefix|b|c
I want to query on this string with prefix. And the query is
aggs":{
"field":{
"filter":{
"match":{
"field":{
"type":"prefix",
"query":"elastic|"
}
}
},
"aggs":{
"field":{
"terms":{
"field":"textField",
"size":255
}
}
}
}
}
},
"
and this query return all texts below in the example.
Do i need extra analyzer or token filter on texts?
How can i exact match search with prefix on elastic ?
you can achieve that by using wildcards in elasticsearch.
{
"query": {
"wildcard": {
"textField": {
"value": "elastic*"
}
}
}
}

How to check field data is numeric when using inline Script in ElasticSearch

Per our requirement we need to find the max ID of the document before adding new document. Problem here is doc may contain string data also So had to use inline script on the elastic query to find out max id only for the document which has integer data otherwise returning 0. am using following inline script query to find max-key but not working. can you help me onthis ?.
{
"size":0,
"query":
{"bool":
{"filter":[
{"term":
{"Name":
{
"value":"Test2"
}
}}
]
}},
"aggs":{
"MaxId":{
"max":{
"field":"Key","script":{
"inline":"((doc['Key'].value).isNumber()) ? Integer.parseInt(doc['Key'].value) : 0"}}
}
}
}
The error is because the max aggregation only supports numeric fields, i.e. you cannot specify a string field (i.e. Key) in a max aggregation.
Simply remove the "field":"Key" part and only keep the script part
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"term": {
"Name": "Test2"
}
}
]
}
},
"aggs": {
"MaxId": {
"max": {
"script": {
"source": "((doc['Key'].value).isNumber()) ? Integer.parseInt(doc['Key'].value) : 0"
}
}
}
}
}

Resources