sort by sub aggregration count elasticsearch - elasticsearch

Hi I have the following query
"aggregations": {
"catalogid-and-terms": {
"terms": {
"field": "catalog_id",
"size":3,
"order": {
"keywords>_count": "desc" <-- HERE IS THE PROBLEM
}
},
"aggregations": {
"keywords": {
"terms": {
"field": "keywords",
"size":1,
"order": {
"_count": "desc"
},
"include": {
"pattern": "mi.*"
}
}
}
}
}
So im trying to get the top keywords in a catalog however i want to sort by number of count of that keywords. how do i do that?

Related

How to get the last Elasticsearch document for each unique value of a field?

I have a data structure in Elasticsearch that looks like:
{
"name": "abc",
"date": "2022-10-08T21:30:40.000Z",
"rank": 3
}
I want to get, for each unique name, the rank of the document (or the whole document) with the most recent date.
I currently have this:
"aggs": {
"group-by-name": {
"terms": {
"field": "name"
},
"aggs": {
"max-date": {
"max": {
"field": "date"
}
}
}
}
}
How can I get the rank (or the whole document) for each result, and if possible, in 1 request ?
You can use below options
Collapse
"collapse": {
"field": "name"
},
"sort": [
{
"date": {
"order": "desc"
}
}
]
Top hits aggregation
{
"aggs": {
"group-by-name": {
"terms": {
"field": "name",
"size": 100
},
"aggs": {
"top_doc": {
"top_hits": {
"sort": [
{
"date": {
"order": "desc"
}
}
],
"size": 1
}
}
}
}
}
}

Elasticsearch sort data upon all buckets

I am trying to make an es sort but I am struggling.
The base story of my data is that I have for example product definition which can consist of various products. (We call them abstract and concrete).
Let's say I have product A that is abstract it can consist of product B,C,D (called concretes).
I also for example have product E that can have F as a concrete and so on.
I want to aggregate the products by their abstract (to only show 1 of each concrete) and then sort all concretes based on some criteria.
I have written the following that doesn't work as expected.
"aggs": {
"category:58": {
"aggs": {
"products": {
"aggs": {
"abstract": {
"top_hits": {
"size": 1,
"sort": [
{
"criteria1": {
"order": "desc"
}
},
{
"_score": {
"order": "desc"
}
},
{
"criteria3": {
"missing": "_last",
"order": "asc",
"unmapped_type": "integer"
}
}
]
}
}
},
"terms": {
"field": "abstract_id",
"size": 10
}
}
},
"filter": {
"term": {
"categories.id": {
"value": "58"
}
}
}
}
},
If I got it correctly this will create 10 buckets and each bucket will have one product, and then my sort sorts a single product, where I should be sorting the entire result. The question is where do I place my sort that is currently in aggs->abstract.
If I remove the grouping by abstract_id and change it to something that is unique then the sorting does work, but then for one abstract product I can get all concretes displayed which I don't want to be the case.
I saw that I can't sort on terms so I'm kinda clueless now.
I ended up using multiple aggregations and then doing a bucket sort.
The query I ended up with looks like this
"aggs": {
"abstract": {
"top_hits": {
"size": 1
}
},
"criteria3": {
"sum": {
"field": "custom_filed_foo_bar"
}
},
"criteria1": {
"sum": {
"field": "boosted_value"
}
},
"criteria2": {
"max": {
"script":{
"source": "_score"
}
}
},
"sorting": {
"bucket_sort": {
"sort": [
{
"criteria1": {
"order": "desc"
}
},
{
"criteria2": {
"order": "desc"
}
},
{
"criteria3": {
"order": "desc"
}
}
]
}
}
I don't know if it's the correct approach but seems to be working

Elasticsearch sort terms agg by arbitrary order

I have a terms aggregation and they want some specific values to always be at the top.
Like:
POST _search
{ "size": 0,
"aggs": {
"pets": {
"terms": {
"field": "species",
"order": "Dogs, Cats"
}
}
}
}
Where the results would be like "Dog", "Cat", "Iguana".
Dog and Cat at the top and everything else below.
Is this possible without scripting?
Thanks!
One way to do it is by filtering values in the terms aggregation. You'd create two terms aggregations, one with the desired terms and another with all other terms.
{
"size": 0,
"aggs": {
"top_terms": {
"terms": {
"field": "species",
"include": ["Dogs", "Cats"],
"order": { "_key" : "desc" }
}
},
"other_terms": {
"terms": {
"field": "species",
"exclude": ["Dogs", "Cats"]
}
}
}
}
Try it out
A script wouldn't be too complicated though -- first boost the two species, then sort by the scores first and then by _count:
GET pets/_search
{
"size": 0,
"query": {
"bool": {
"should": [
{
"terms": {
"species": [
"dog",
"cat"
],
"boost": 10
}
},
{
"match_all": {}
}
]
}
},
"aggs": {
"pets": {
"terms": {
"field": "species.keyword",
"order": [
{
"max_score": "desc"
},
{
"_count": "desc"
}
]
},
"aggs": {
"max_score": {
"max": {
"script": "_score"
}
}
}
}
}
}

How to take more fields when grouping

Trying to group data and take all of its fields by the way.
GET /testnews/default/_search
{
"size": 10,
"from":50,
"query":{
"multi_match": {
"query": "serenay",
"fields": ["Data.Title", "Data.Description", "Data.Tags.Title", "Data.MentionTitle", "Data.Program.title", "Data.Program.description", "Data.Program.original_title"]
}
},
"sort":[{
"Data.CreatedAt": {
"order": "desc"
},
"Data.ViewCount": {
"order": "desc"
}
}],
"aggs": {
"group_by_state": {
"terms": {
"field": "Data.Program.title.keyword"
}
}
}
}
But when I did it, it returns only "Program Title" in the grouped result.
Just like:
{
"key": "Kocamın Ailesi",
"doc_count": 3
}
But I just want it like:
{
"key": "Kocamın Ailesi",
"description": "blabla",
"image": "blabla.jpg",
"date": "YYYY-mm-dd",
"doc_count": 3
}
just like sql
select * from x group by field
Regarding the SQL example, to get the behaviour of
select a, b, count(*) from x group by a, b
you can aggregate on a, then b like this:
"aggs": {
"group_by_a": {
"terms": {
"field": "a"
},
"aggs": {
"group_by_b": {
"terms": {
"field":"b"
}
}
}
}
}
But I don't think that is what you're looking for?
If you want the full documents in aggregations you can use the "top_hits" aggregation to select the top n hits within each aggregation:
{
"aggs": {
"group_by_state": {
"terms": {
"field": "Data.Program.title.keyword"
},
"aggs": {
"state_top_hits": {
"top_hits": {
"sort": [
{ "Data.CreatedAt": { "order": "desc" } },
{ "Data.ViewCount": { "order": "desc" } }
],
"_source": {
"includes": [ "key", "description", "image", "date" ]
},
"size": 10 //Will show top 10 hits within keyword agg ordered according to the sort
}
}
}
}
}
}

how to bucket empty and non empty fields in nested aggregation in elasticsearch?

I have the following set of nested subaggregations in elasticsearch (field2 is a subaggregation of field1 and field3 is a subaggregation of field2).
It turns out however that the terms aggregation for field3 will not bucket documents that dont have field3.
My understanding is that I have to use a Missing subaggregation query to bucket those in addition to the term query for field3.
But I am not sure how can I add it to the query below to bucket both.
{
"size": 0,
"aggregations": {
"f1": {
"terms": {
"field": "field1",
"size": 0,
"order": {
"_count": "asc"
},
"include": [
"123"
]
},
"aggregations": {
"field2": {
"terms": {
"field": "f2",
"size": 0,
"order": {
"_count": "asc"
},
"include": [
"tr"
]
},
"aggregations": {
"field3": {
"terms": {
"field": "f3",
"order": {
"_count": "asc"
},
"size": 0
},
"aggregations": {
"aggTopHits": {
"top_hits": {
"size": 1
}
}
}
}
}
}
}
}
}
}
In version 2.1.2 and later, you can use the missing parameter of the terms aggregation, which allows you to specify a default value for documents that are missing that field. (FYI, the missing parameter was available starting 2.0, but there was a bug which prevented it from working on sub-aggregations, which is how you would use it here.)
...
"aggregations": {
"field3": {
"terms": {
"field": "f3",
"order": {
"_count": "asc"
},
"size": 0,
"missing": "n/a" <----- provide a default here
},
"aggregations": {
"aggTopHits": {
"top_hits": {
"size": 1
}
}
}
}
}
However, if you are working with a pre-2.x ES cluster, you can use the missing aggregation at the same depth as your field3 aggregation to bucket the documents that are missing "f3" like this:
...
"aggregations": {
"field3": {
"terms": {
"field": "f3",
"order": {
"_count": "asc"
},
"size": 0
},
"aggregations": {
"aggTopHits": {
"top_hits": {
"size": 1
}
}
}
},
"missing_field3": {
"missing" : {
"field": "f3"
},
"aggregations": {
"aggTopMissingHit": {
"top_hits": {
"size": 1
}
}
}
}
}

Resources