Elasticsearch sort data upon all buckets - sorting

I am trying to make an es sort but I am struggling.
The base story of my data is that I have for example product definition which can consist of various products. (We call them abstract and concrete).
Let's say I have product A that is abstract it can consist of product B,C,D (called concretes).
I also for example have product E that can have F as a concrete and so on.
I want to aggregate the products by their abstract (to only show 1 of each concrete) and then sort all concretes based on some criteria.
I have written the following that doesn't work as expected.
"aggs": {
"category:58": {
"aggs": {
"products": {
"aggs": {
"abstract": {
"top_hits": {
"size": 1,
"sort": [
{
"criteria1": {
"order": "desc"
}
},
{
"_score": {
"order": "desc"
}
},
{
"criteria3": {
"missing": "_last",
"order": "asc",
"unmapped_type": "integer"
}
}
]
}
}
},
"terms": {
"field": "abstract_id",
"size": 10
}
}
},
"filter": {
"term": {
"categories.id": {
"value": "58"
}
}
}
}
},
If I got it correctly this will create 10 buckets and each bucket will have one product, and then my sort sorts a single product, where I should be sorting the entire result. The question is where do I place my sort that is currently in aggs->abstract.
If I remove the grouping by abstract_id and change it to something that is unique then the sorting does work, but then for one abstract product I can get all concretes displayed which I don't want to be the case.
I saw that I can't sort on terms so I'm kinda clueless now.

I ended up using multiple aggregations and then doing a bucket sort.
The query I ended up with looks like this
"aggs": {
"abstract": {
"top_hits": {
"size": 1
}
},
"criteria3": {
"sum": {
"field": "custom_filed_foo_bar"
}
},
"criteria1": {
"sum": {
"field": "boosted_value"
}
},
"criteria2": {
"max": {
"script":{
"source": "_score"
}
}
},
"sorting": {
"bucket_sort": {
"sort": [
{
"criteria1": {
"order": "desc"
}
},
{
"criteria2": {
"order": "desc"
}
},
{
"criteria3": {
"order": "desc"
}
}
]
}
}
I don't know if it's the correct approach but seems to be working

Related

How to get the last Elasticsearch document for each unique value of a field?

I have a data structure in Elasticsearch that looks like:
{
"name": "abc",
"date": "2022-10-08T21:30:40.000Z",
"rank": 3
}
I want to get, for each unique name, the rank of the document (or the whole document) with the most recent date.
I currently have this:
"aggs": {
"group-by-name": {
"terms": {
"field": "name"
},
"aggs": {
"max-date": {
"max": {
"field": "date"
}
}
}
}
}
How can I get the rank (or the whole document) for each result, and if possible, in 1 request ?
You can use below options
Collapse
"collapse": {
"field": "name"
},
"sort": [
{
"date": {
"order": "desc"
}
}
]
Top hits aggregation
{
"aggs": {
"group-by-name": {
"terms": {
"field": "name",
"size": 100
},
"aggs": {
"top_doc": {
"top_hits": {
"sort": [
{
"date": {
"order": "desc"
}
}
],
"size": 1
}
}
}
}
}
}

ElasticSearch - order with min in aggregation

I have objects in the index that are related by an id, which groups them.
The group creation time is the time between the min createdAt object in the group and the max createdAt object in the group.
I'd like to order these groups by the min or max time, how can I do this?
{
"size":0,
"aggs":{
"intervals":{
"composite":{
"size":10000,
"sources":[
{
"totalId":{
"terms":{
"field":"totalId"
}
},
"name": {
"terms":{
"field":"name"
}
}
}
]
},
"aggs": {
"createdAtStart": {
"min": {"field": "createdAt", "format": "YYYY-MM-DD'T'HH:mm:ssZ"}, "order": { "createdAtStart": "desc" }
},
"createdAtEnd": {
"max": {"field": "createdAt", "format": "YYYY-MM-DD'T'HH:mm:ssZ"}
}
}
}
}
I'm using order wrong:
Found two aggregation type definitions
You cannot achieve that with a composite aggregation because the terms source is not orderable by the values of a sub-aggregation, like it is the case with a "normal" terms aggregation. (also the date formats are wrong)
So the correct query that will give you want you want is this one:
{
"size": 0,
"aggs": {
"totalId": {
"terms": {
"field": "totalId",
"order": {
"createdAtStart": "asc"
}
},
"aggs": {
"createdAtStart": {
"min": {
"field": "createdAt",
"format": "yyyy-MM-dd'T'HH:mm:ssZ"
}
},
"createdAtEnd": {
"max": {
"field": "createdAt",
"format": "yyyy-MM-dd'T'HH:mm:ssZ"
}
}
}
}
}
}
Because of the way the composite aggregation works, it's not possible to achieve what you want. The reason is that the composite aggregation has been created in order to "paginate" over a big amount of buckets. That pagination is defined by the way the buckets are ordered. If it was possible to sort buckets according to sub-aggregations, it would mean that all buckets would need to be pre-computed and pre-sorted before returning the first page of results, which would completely defeat the purpose of this aggregation.
You are adding an extra {
{
"size": 0,
"aggs": {
"intervals": {
"composite": {
"size": 10000,
"sources": [
{
"totalId": {
"terms": {
"field": "totalId"
}
}
}
] <-- note this
},
"aggs": {
"createdAtStart": {
"min": {
"field": "createdAt",
"format": "YYYY-MM-DD'T'HH:mm:ssZ"
},
"order": {
"createdAtStart": "desc"
}
},
"createdAtEnd": {
"max": {
"field": "createdAt",
"format": "YYYY-MM-DD'T'HH:mm:ssZ"
}
}
}
}
}
}

ElasticSearch return aggregations random order

I've got the following ElasticSearch-query, to get 10 documents from each "category" grouped on "cat.id":
"aggs": {
"test": {
"terms": {
"size": 10,
"field": "cat.id"
},
"aggs": {
"top_test_hits": {
"top_hits": {
"_source": {
"includes": [
"id"
]
},
"size": 10
}
}
}
}
}
This is working fine. However I cannot seem to find a way, to randomly take 10 results from each bucket. The results are always the same. And I would like to have 10 random items from each bucket. I tried all kinds of things which are intended for documents, but non of them seem to be working.
As was already suggested in this answer, you can try using random sort in the top_hits aggregation, using a _script like this:
{
"aggs": {
"test": {
"terms": {
"size": 10,
"field": "cat.id"
},
"aggs": {
"top_test_hits": {
"top_hits": {
"_source": {
"includes": [
"id"
]
},
"size": 10,
"sort": {
"_script": {
"type": "number",
"script": {
"lang": "painless",
"source": "(System.currentTimeMillis() + doc['_id'].value).hashCode()"
},
"order": "asc"
}
}
}
}
}
}
}
}
Random sorting was broadly covered in this question.
Hope that helps!

Aggregate over multiple fields without subaggregation

I have documents in my ElasticSearch which have two fields. I want to build an aggregate over the combination of these, kind of like in SQL GROUP BY field_A, field_B and get a row per existing combination. I read everywhere that I should use subaggregation for this.
{
"aggs": {
"sales_by_article": {
"terms": {
"field": "catalogs.article_grouping",
"size": 1000000,
"order": {
"total_amount": "desc"
}
},
"aggs": {
"total_amount": {
"sum": {
"script": "Math.round(doc['amount.value'].value*100)/100.0"
}
},
"sales_by_submodel": {
"terms": {
"field": "catalogs.submodel_grouping",
"size": 1000,
"order": {
"total_amount": "desc"
}
},
"aggs": {
"total_amount": {
"sum": {
"script": "Math.round(doc['amount.value'].value*100)/100.0"
}
}
}
}
}
}
},
"size": 0
}
With the following simplified result:
{
"aggregations": {
"sales_by_article": {
"buckets": [
{
"key": "19114",
"total_amount": {
"value": 426794.25
},
"sales_by_submodel": {
"buckets": [
{
"key": "12",
"total_amount": {
"value": 51512.200000000004
}
},
...
]
}
},
...
]
}
}
}
However, the problem with this is that the ordering is not what I want. In this particular case, it first orders the articles based on total_amount per article, and then within an article it orders the submodels based on total_amount per submodel. However, what I want to achieve is to only have the deepest level and get an aggregation for the combination of article and submodel, ordered by the total_amount of this combination. This is the result I would like:
{
"aggregations": {
"sales_by_article_and_submodel": {
"buckets": [
{
"key": "1911412",
"total_amount": {
"value": 51512.200000000004
}
},
...
]
}
}
}
It's discussed in the docs a bit here: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_multi_field_terms_aggregation
Basically you can use a script to create a term which is derived from each document (using as many fields as you want) at query run time, but it will be slow. If you are doing it for ad hoc analysis, it'll work fine. If you need to serve these requests at some high rate, then you probably want to make a field in your model that is a combination of the two fields you're interested in, so the index is populated for you already.
Example query using the script approach:
GET agreements/agreement/_search?size=0
{
"aggs" : {
"myAggregationName" : {
"terms" : {
"script" : {
"source": "doc['owningVendorCode'].value + '|' + doc['region'].value",
"lang": "painless"
}
}
}
}
}
I have learned I should use composite aggregates for this.

Elasticsearch minBy

Is there a way in elasticsearch to get a field from a document containing the maximum value? (Basically working similarly to maxBy from scala)
For example (mocked):
{
"aggregations": {
"grouped": {
"terms": {
"field": "grouping",
"order": {
"docWithMin": "asc"
}
},
"aggregations": {
"withMax": {
"max": {
"maxByField": "a",
"field": "b"
}
}
}
}
}
}
For which {"grouping":1,"a":2,"b":5},{"grouping":1,"a":1,"b":10}
would return (something like): {"grouped":1,"withMax":5}, where the max comes from the first object due to "a" being higher there.
Assuming you just want the document back for which a is maximum, you can do this:
{
"size": 0,
"aggs": {
"grouped": {
"terms": {
"field": "grouping"
},
"aggs": {
"maxByA": {
"top_hits": {
"sort": [
{"a": {"order": "desc"}}
],
"size": 1
}
}
}
}
}
}

Resources