Is there a way to compare string alphabetically in painless - elasticsearch

I would like to execute this kind of operation in painless :
if (_value >= 'c)' {
return _value
} else {
return '__BAD__'
}
value is a string and I would like this following behaviour :
if value is foo I want to replace it with __BAD__ if the value is bar, I want to keep bar. only values alphabetically after 'c' should be set to __BAD__.
I got this exception :
"lang": "painless",
"caused_by": {
"type": "class_cast_exception",
"reason": "Cannot apply [>] operation to types [java.lang.String] and [java.lang.String]."
}
Is there a way to perform string alphabetical comparaison between string in painless ?
My documents are looking :
{
"id": "doca",
"categoryId": "aaa",
"parentNames": "a$aa$aaa"
},
{
"id": "docb",
"categoryId": "bbb",
"parentNames": "a$aa$bbb"
},
{
"id": "docz",
"categoryId": "zzz",
"parentNames": "a$aa$zzz"
}
and my query is like :
{
"query": {
"bool": {
"filter": []
}
},
"size": 0,
"aggs": {
"catNames": {
"terms": {
"size": 10000,
"order": {
"_key": "asc"
},
"script": {
"source": "if(doc['parentNames'].value < 'a$aa$ccc') {return doc['parentNames'].value} return '__BAD__'",
"lang": "painless"
}
},
"aggs": {
"sort": {
"bucket_sort": {
"size": 2
}
},
"catId": {
"terms": {
"field": "categoryId",
"size": 1
}
}
}
}
}
}
I am expecting the result :
{
"took": 29,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0,
"hits": []
},
"aggregations": {
"catNames": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "__BAD__",
"doc_count": 1,
"catId": {
"buckets": [
{
"key": "aaa",
"doc_count": 1
}
]
}
},
{
"key": "a$aa$bbb",
"doc_count": 1,
"catId": {
"buckets": [
{
"key": "bbb",
"doc_count": 1
}
]
}
},
{
"key": "a$aa$zzz",
"doc_count": 1,
"catId": {
"buckets": [
{
"key": "zzz",
"doc_count": 1
}
]
}
}
]
}
}
}

In fact, I can use the compareTo function of java.lang.String.
if (_value.compareTo('c') > 0) {
return _value
} else {
return '__BAD__'
}
My query is becoming :
{
"query": {
"bool": {
"filter": []
}
},
"size": 0,
"aggs": {
"catNames": {
"terms": {
"size": 10000,
"order": {
"_key": "asc"
},
"script": {
"source": "if(doc['parentNames'].value.compareTo('a$aa$ccc')) {return doc['parentNames'].value} return '__BAD__'",
"lang": "painless"
}
},
"aggs": {
"sort": {
"bucket_sort": {
"size": 2
}
},
"catId": {
"terms": {
"field": "categoryId",
"size": 1
}
}
}
}
}
}

Related

Elasticsearch sorting based on multiple aggeration

I try to get my data with different aggeration criterias afterwards I want to order it based on one of aggeration criteria. In this specific case I want to get my data to be ordered descendly based on "Monthly_Income/ SUM" criteria.
I searched and tried lots of thing but none of them worked for me. Could you give me the answer because I am new on elasticsearch.
what I searched so far and couldn't solve the problem ;
"ordering_by_a_sub_aggregation,
Sorting Based on "Deep" Metrics,
search-aggregations-bucket-terms-aggregation-script,
search-aggregations-bucket-multi-terms-aggregation
To visualize the problem. I always get the belowing result however I tried lots of methods but I couldn't achieve to get desired result.
undesired result
desired result
Request
`
{
"query": {
"bool": {
"must": [],
"must_not": []
}
},
"size": 0,
"aggs": {
"GENDER": {
"terms": {
"field": "GENDER.keyword",
"size": 10000000,
"missing": "N/A"
// ,"order": {"MARTIAL_STATUS>Monthly_Income_0.max" : "desc" }
},
"aggs": {
"MARTIAL_STATUS": {
"terms": {
"field": "MARTIAL_STATUS.keyword",
"size": 10000000,
"missing": "N/A"
// ,"order": {"Monthly_Income_0.value" : "desc" }
},
"aggs": {
"Monthly_Income_0": {
"sum": {
"field": "Monthly_Income"
}
},
"Monthly_Income_1": {
"value_count": {
"field": "Monthly_Income"
}
},
"SALE_PRICE_2": {
"sum": {
"field": "SALE_PRICE"
}
}
// ,"sort_by_percentage": {
// "bucket_sort": {
// "sort": [ { "Monthly_Income_0.value": { "order": "desc" } } ]
// }
// }
}
}
}
}
}
}
`
Response
`
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 10000,
"relation": "gte"
},
"max_score": null,
"hits": []
},
"aggregations": {
"GENDER": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Male",
"doc_count": 40959,
"MARTIAL_STATUS": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Married",
"doc_count": 35559,
"SALE_PRICE_2": {
"value": 2.530239767013672E9
},
"Monthly_Income_0": {
"value": 3.59618565E8
},
"Monthly_Income_1": {
"value": 35559
}
},
{
"key": "Single",
"doc_count": 5399,
"SALE_PRICE_2": {
"value": 3.7742297754296875E8
},
"Monthly_Income_0": {
"value": 5.3465554E7
},
"Monthly_Income_1": {
"value": 5399
}
},
{
"key": "N/A",
"doc_count": 1,
"SALE_PRICE_2": {
"value": 87344.203125
},
"Monthly_Income_0": {
"value": 40000.0
},
"Monthly_Income_1": {
"value": 1
}
}
]
}
},
{
"key": "Female",
"doc_count": 7777,
"MARTIAL_STATUS": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Married",
"doc_count": 5299,
"SALE_PRICE_2": {
"value": 3.9976638293359375E8
},
"Monthly_Income_0": {
"value": 4.4994796E7
},
"Monthly_Income_1": {
"value": 5299
}
},
{
"key": "Single",
"doc_count": 2477,
"SALE_PRICE_2": {
"value": 1.8698677312695312E8
},
"Monthly_Income_0": {
"value": 1.8793502E7
},
"Monthly_Income_1": {
"value": 2477
}
},
{
"key": "N/A",
"doc_count": 1,
"SALE_PRICE_2": {
"value": 101006.8203125
},
"Monthly_Income_0": {
"value": 10000.0
},
"Monthly_Income_1": {
"value": 1
}
}
]
}
}
]
}
}
}
`
I try to order based on an aggerate column but I couldn't able to achieve
My understanding of your issue is that you want to group by on combination of gender and marital status
I have used runtime mapping to concatenate fields "gender" and marital status and used term aggregation to group by on run time field and sorted groups based on sum.
{
"size": 0,
"runtime_mappings": {
"gender-maritalstatus": {
"type": "keyword",
"script": {
"source": """
def gender='NA';
def maritalstatus='NA';
if(doc['Gender.keyword'].size()!=0)
gender= doc['Gender.keyword'].value;
if(doc['Marital_Status.keyword'].size()!=0)
maritalstatus= doc['Marital_Status.keyword'].value;
emit(gender+'-'+maritalstatus);
"""
}
}
},
"aggs": {
"gender-marital-grouping": {
"terms": {
"field": "gender-maritalstatus",
"order": {
"monthly_income": "desc"
},
"size": 10
},
"aggs": {
"monthly_income": {
"sum": {
"field": "Monthly_Income"
}
}
}
}
}
}
Result
"buckets" : [
{
"key" : "Female-Single",
"doc_count" : 2,
"monthly_income" : {
"value" : 300.0
}
},
{
"key" : "Male-Married",
"doc_count" : 2,
"monthly_income" : {
"value" : 200.0
}
},
{
"key" : "Female-NA",
"doc_count" : 1,
"monthly_income" : {
"value" : 100.0
}
},
{
"key" : "Male-NA",
"doc_count" : 1,
"monthly_income" : {
"value" : 100.0
}
},
{
"key" : "Male-Single",
"doc_count" : 1,
"monthly_income" : {
"value" : 100.0
}
}
]

Elasticsearch exclude key from composite aggregation

i need to perform an exclusion of some key in a composite aggregation.
here is one document of my index as an example :
{
"end_date": 1230314400000,
"parameter_codes": [28, 35, 30],
"platform_code": "41012",
"start_date": 1230314400000,
"station_id": 7833246
}
I perform a search request allowing me to : get a result for each platform_code/parameter_codes couple, plus getting the station_id correspounding plus a paging on the bucket.
here is the request :
{
"size": 0,
"query": {
"match_all": {
"boost": 1.0
}
},
"_source": false,
"aggregations": {
"compositeAgg": {
"composite": {
"size": 10,
"sources": [{
"platform_code": {
"terms": {
"field": "platform_code",
"missing_bucket": false,
"order": "asc"
}
}
}, {
"parameter_codes": {
"terms": {
"field": "parameter_codes",
"missing_bucket": false,
"order": "asc"
}
}
}]
},
"aggregations": {
"aggstation_id": {
"terms": {
"field": "station_id",
"size": 2147483647,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": {
"_key": "asc"
}
}
},
"pipe": {
"bucket_sort": {
"sort": [{
"_key": {
"order": "asc"
}
}],
"from": 0,
"size": 10,
"gap_policy": "SKIP"
}
}
}
}
}
}
this request give me the following results :
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 8,
"successful": 8,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"composite#compositeAgg": {
"after_key": {
"platform_code": "41012",
"parameter_codes": 60
},
"buckets": [{
"key": {
"platform_code": "41012",
"parameter_codes": 28
},
"doc_count": 1,
"lterms#aggstation_id": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [{
"key": 7833246,
"doc_count": 1
}]
}
}, {
"key": {
"platform_code": "41012",
"parameter_codes": 30
},
"doc_count": 2,
"lterms#aggstation_id": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [{
"key": 7833246,
"doc_count": 1
}, {
"key": 12787501,
"doc_count": 1
}]
}
}, {
"key": {
"platform_code": "41012",
"parameter_codes": 35
},
"doc_count": 2,
"lterms#aggstation_id": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [{
"key": 7833246,
"doc_count": 1
}, {
"key": 12787501,
"doc_count": 1
}]
}
}]
}
}
}
this works very well but i need to exclude one or many parameter_code.
For example by excluding '35', i want only the keys :
{
"platform_code": "41012",
"parameter_codes": 28
}
and
{
"platform_code": "41012",
"parameter_codes": 30
}
i tried, many options but can not succeed to perform this.
Can anybody know how can i do that?
A script query can be used in composite source to return only specific values of array.
{
"size": 0,
"query": {
"match_all": {
"boost": 1
}
},
"_source": false,
"aggregations": {
"compositeAgg": {
"composite": {
"size": 10,
"sources": [
{
"platform_code": {
"terms": {
"field": "platform_code.keyword",
"missing_bucket": false,
"order": "asc"
}
}
},
{
"parameter_codes": {
"terms": {
"script": {
"source": """
def arr=[];
for (item in doc['parameter_codes']) {
if(item !=35)
{
arr.add(item);
}
}
return arr"""
}
}
}
}
]
},
"aggregations": {
"aggstation_id": {
"terms": {
"field": "station_id",
"size": 2147483647,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": {
"_key": "asc"
}
}
},
"pipe": {
"bucket_sort": {
"sort": [
{
"_key": {
"order": "asc"
}
}
],
"from": 0,
"size": 10,
"gap_policy": "SKIP"
}
}
}
}
}
}
You can try to exclude "parameter_codes=35" this option from the query.
{
"query": {
"bool": {
"must_not": [
{
"term": {
"parameter_codes": {
"value": "35"
}
}
}
]
}
}
}

how to sort bucket key to integer in elasticsearch?

i have a problem with sorting by bucket key.
how do i sort bucket key by integer?
this is my query.
{
"aggregations": {
"by_time": {
"terms": {
"script": {
"source": "Instant.ofEpochMilli(doc['statdate'].date.millis).atZone(ZoneId.of(params.tz)).hour",
"lang": "painless",
"params": {
"tz": "Asia/Seoul"
}
},
"size": 10,
"min_doc_count": 0,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": {
"_key": "asc"
}
}
}
}
and result.
{
"aggregations": {
"sterms#by_time": {
"buckets": [
{
"key": "11",
"doc_count": 1
},
{
"key": "19",
"doc_count": 1
},
{
"key": "22",
"doc_count": 1
}
},
{
"key": "7",
"doc_count": 1
},
{
"key": "9",
"doc_count": 7
}
]
}
}
but i don't want this result.
i think what this key type is string.
how can i sort by integer key?
You need to use bucket sort aggregation that is a parent pipeline
aggregation which sorts the buckets of its parent multi-bucket
aggregation. Zero or more sort fields may be specified together with
the corresponding sort order. Each bucket may be sorted based on its
_key, _count or its sub-aggregations.
Try out this search query:
{
"aggregations": {
"by_time": {
"terms": {
"script": {
"source": "Instant.ofEpochMilli(doc['statdate'].date.millis).atZone(ZoneId.of(params.tz)).hour",
"lang": "painless",
"params": {
"tz": "Asia/Seoul"
}
},
"size": 20, <-- note this
"min_doc_count": 0,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": {
"_key": "asc"
}
},
"aggs": {
"bucket_truncate": {
"bucket_sort": { <-- note this
"sort": [
{
"_key": {
"order": "asc"
}
}
],
"size": 20 <-- note this
}
}
}
}
}
}
Adding a working example with index data, search query, and search result
Index Data:
{
"id":"1",
"title":"a"
}
{
"id":"3",
"title":"c"
}
{
"id":"2",
"title":"b"
}
{
"id":"2",
"title":"c"
}
Search Query:
{
"size": 0,
"aggs": {
"unique_id": {
"terms": {
"field": "id.keyword"
},
"aggs": {
"bucket_truncate": {
"bucket_sort": {
"sort": [
{
"_key": {
"order": "asc"
}
}
]
}
}
}
}
}
}
Search Result:
"aggregations": {
"unique_id": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "1",
"doc_count": 1
},
{
"key": "2",
"doc_count": 2
},
{
"key": "3",
"doc_count": 1
}
]
}
}

Buckets size filter in Elasticsearch

Here is my query result
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 502,
"max_score": 0,
"hits": []
},
"aggregations": {
"HIGH_RISK_USERS": {
"doc_count": 1004,
"USERS_COUNT": {
"doc_count_error_upper_bound": 5,
"sum_other_doc_count": 437,
"buckets": [
{
"key": "49",
"doc_count": 502,
"NAME": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": []
}
},
{
"key": "02122219455#53.205.223.157",
"doc_count": 44,
"NAME": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "caller",
"doc_count": 42
},
{
"key": "CallFrom",
"doc_count": 2
}
]
}
},
{
"key": "+02129916178#53.205.223.157",
"doc_count": 2,
"NAME": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "caller",
"doc_count": 2
}
]
}
}
]
}
}
}
}
Here is my query
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"nested": {
"path": "x_nova_extensions.entities",
"query": {
"bool": {
"filter": [
{
"match": {
"x_nova_extensions.entities.text": "49"
}
},
{
"terms": {
"x_nova_extensions.entities.type": [
"sourceCountryCode",
"CallerIPCountryCode",
"CallerIPCountryName",
"CallerIPCountryCode",
"CallerPhoneCountryName"
]
}
}
]
}
}
}
}
]
}
},
"aggs": {
"HIGH_RISK_USERS": {
"nested": {
"path": "x_nova_extensions.entities"
},
"aggs": {
"USERS_COUNT": {
"terms": {
"field": "x_nova_extensions.entities.text",
"size": 10,
"order": {
"_count": "desc"
}
},
"aggs": {
"NAME": {
"terms": {
"field": "x_nova_extensions.entities.type",
"include": [
"caller",
"callee",
"CallFrom",
"CallTo"
]
}
}
}
}
}
}
}
}
I want my query to return only bucket[].size > 0
I searched on the internet and I couldn't find any specific keyword or something else. Even I am not sure if Elasticsearch supports this or not. I want to sure that Elasticsearch supports this
Are there any keyword or how can I handle it ?
Thanks
I think the thing that you are looking for is Aggregation Pipeline
By that way, you can reach the bucket size and filter the result accordingly.
"min_bucket_selector": {
"bucket_selector": {
"buckets_path": {
"nameCount": "NAME._bucket_count"
},
"script": {
"source": "params.nameCount != 0"
}
}
}
}
}
But please pay attention to the elasticsearch version. The way how it is applied can be different according to the version.

How to calculate difference between metrics in different aggregations in elasticsearch

I want to calculate the difference of nested aggregations between two dates.
To be more concrete is it possible to calculate the difference between date_1.buckets.field_1.buckets.field_2.buckets.field_3.value - date_2.buckets.field_1.buckets.field_2.buckets.field_3.value given the below request/response. Is that possible with elasticsearch v.1.0.1?
The aggregation query request looks like this:
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"terms": {
"date": [
"2014-08-18 00:00:00.0",
"2014-08-15 00:00:00.0"
]
}
}
]
}
}
}
},
"aggs": {
"date_1": {
"filter": {
"terms": {
"date": [
"2014-08-18 00:00:00.0"
]
}
},
"aggs": {
"my_agg_1": {
"terms": {
"field": "field_1",
"size": 2147483647,
"order": {
"_term": "desc"
}
},
"aggs": {
"my_agg_2": {
"terms": {
"field": "field_2",
"size": 2147483647,
"order": {
"_term": "desc"
}
},
"aggs": {
"my_agg_3": {
"sum": {
"field": "field_3"
}
}
}
}
}
}
}
},
"date_2": {
"filter": {
"terms": {
"date": [
"2014-08-15 00:00:00.0"
]
}
},
"aggs": {
"my_agg_1": {
"terms": {
"field": "field_1",
"size": 2147483647,
"order": {
"_term": "desc"
}
},
"aggs": {
"my_agg_1": {
"terms": {
"field": "field_2",
"size": 2147483647,
"order": {
"_term": "desc"
}
},
"aggs": {
"my_agg_3": {
"sum": {
"field": "field_3"
}
}
}
}
}
}
}
}
}
}
And the response looks like this:
{
"took": 236,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1646,
"max_score": 0,
"hits": []
},
"aggregations": {
"date_1": {
"doc_count": 823,
"field_1": {
"buckets": [
{
"key": "field_1_key_1",
"doc_count": 719,
"field_2": {
"buckets": [
{
"key": "key_1",
"doc_count": 275,
"field_3": {
"value": 100
}
}
]
}
}
]
}
},
"date_2": {
"doc_count": 823,
"field_1": {
"buckets": [
{
"key": "field_1_key_1",
"doc_count": 719,
"field_2": {
"buckets": [
{
"key": "key_1",
"doc_count": 275,
"field_3": {
"value": 80
}
}
]
}
}
]
}
}
}
}
Thank you.
With elasticsearch new version (eg: 5.6.9) is possible:
{
"size": 0,
"query": {
"constant_score": {
"filter": {
"bool": {
"filter": [
{
"range": {
"date_created": {
"gte": "2018-06-16T00:00:00+02:00",
"lte": "2018-06-16T23:59:59+02:00"
}
}
}
]
}
}
}
},
"aggs": {
"by_millisec": {
"range" : {
"script" : {
"lang": "painless",
"source": "doc['date_delivered'][0] - doc['date_created'][0]"
},
"ranges" : [
{ "key": "<1sec", "to": 1000.0 },
{ "key": "1-5sec", "from": 1000.0, "to": 5000.0 },
{ "key": "5-30sec", "from": 5000.0, "to": 30000.0 },
{ "key": "30-60sec", "from": 30000.0, "to": 60000.0 },
{ "key": "1-2min", "from": 60000.0, "to": 120000.0 },
{ "key": "2-5min", "from": 120000.0, "to": 300000.0 },
{ "key": "5-10min", "from": 300000.0, "to": 600000.0 },
{ "key": ">10min", "from": 600000.0 }
]
}
}
}
}
No arithmetic operations are allowed between two aggregations' result from elasticsearch DSL, not even using scripts. (Upto version 1.1.1, at least I know)
Such operations need to be handeled in client side after processing the aggs result.
Reference
elasticsearch aggregation to sort by ratio of aggregations
In 1.0.1 I couldn't find anything but in 1.4.2 you could try scripted_metric aggregation (still experimental).
Here are the scripted_metric documentation page
I am not good with the elasticsearch syntax but I think your metric inputs would be:
init_script- just initialize a accumulator for each date:
"init_script": "_agg.d1Val = 0; _agg.d2Val = 0;"
map_script- test the date of the document and add to the right accumulator:
"map_script": "if (doc.date == firstDate) { _agg.d1Val += doc.field_3; } else { _agg.d2Val = doc.field_3;};",
reduce_script - accumulate intermediate data from various shards and return the final results:
"reduce_script": "totalD1 = 0; totalD2 = 0; for (agg in _aggs) { totalD1 += agg.d1Val ; totalD2 += agg.d2Val ;}; return totalD1 - totalD2"
I don't think that in this case you need a combine_script.
If course, if you can't use 1.4.2 than this is no help :-)

Resources