how to order on doc count for terms aggregation within a composite aggregation? - elasticsearch

I was trying the composite aggregation in elastic-search but found it weird that what i can do within a terms aggregation normally, isn't supported for terms within a composite aggregation!
See the query below :
GET _search
{
"size": 0,
"query": {
"match_all": {}
},"aggs": {
"compo": {
"composite": {
"sources": [
{
"terms_inside": {
"terms": {
"field": "result_type",
"order": {
"_count": "asc" // not supported here!
}
}
}
}
]
}
},
"just_terms" :{
"terms": {
"field": "result_type",
"order": {
"_count": "asc" // supported here
}
}
}
}
}
Is the just the way it is, or is there a way to get sorted buckets on doc count with nested terms aggregation. I want to use paging and sorting on the terms aggregation.

It cannot be done as composite results paginate the aggregation and thus its function is designed to not fetch the count on all fields, only those in the first paginated set.
https://discuss.elastic.co/t/composite-aggregation-order-by/139563/5

You cannot aggregate on multiple terms and order on doc_count before elastic 7.12. On elasticsearch 7.12, you can use a multi terms aggregation.

Related

How to paginate sorted data (with terms aggregation) using composite aggrgation?

How to write a pipeline aggregation to paginate the sorted data.[where sorting is done using terms aggregation based on its sub-aggregation]
GET index_name/_search
{
"query":{<some querying>}
"aggs": {
"pagination": {
"composite": {
"sources": [
{
"grouping": {
"terms": {
"field": "field_name.keyword",
"order": "desc"
}
}
}
]
},
"aggs": {
"results": {
"terms": {
"field": "field_name.keyword",
"order": {
"sub_aggregation": "desc"
}
},
"aggs": {
"sub_aggregation": {
"filter": {
"term": {
"field_name": "value"
}
}
}
}
}
}
}
}
}
The main problem is merging the following 2 sub-problems
P1. Sorting data based on selected key for which I used terms aggregation and inside the order of the same I have the sub-aggregation.
P2. I want to paginate the above-sorted data, I have used composite aggregation with terms aggregation.
When the composite aggregation is sub-aggregation to the terms aggregation. I get the following error:
[composite] aggregation cannot be used with a parent aggregation of type: [TermsAggregatorFactory]
and when I try the vice versa I get paginated data of terms data(P2) separately and sorting(P1) data in separate buckets.
How can I merge these two problems?

Is it possible to paginate term aggregation result with search term?

Is it possible to use pagination in term aggregation query with a search term?
I need to paginate the result of the following query I am not able to find any solution ?
{
"sort": [{
"create_date": {
"order": "desc"
}
}],
"query": {
"bool": {
"must": []
}
},
"aggs": {
"genres": {
"terms": {
"field": "mentions.keyword",
"include": "insta.*"
}
}
}
}
you could use size and from to tell the engine to return the documents in that range every time you come back for next page. Have two variables in your service design and whoever calls the service should also pass the two variables values (basically documents from and the limit)
{
"from": from,
"size": limit,
"sort": [{
"create_date": {
"order": "desc"
}
}],
"query": {
"bool": {
"must": []
}
},
"aggs": {
"genres": {
"terms": {
"field": "mentions.keyword",
"include": "insta.*"
}
}
}
}
if you exposed this query through a service for example mysearch then call the service like this
mysearch?searchTerm=theWord&from=0&limit=15
and in the next call you do the same but with different from and limit values
mysearch?searchTerm=theWord&from=16&limit=15
if this information is not enough then post some sample documents to play with
If you are trying to fetch documents inside terms aggregation, you can use either of two options
In terms aggregation you can use partition to paginate data.
Refer document here
You can use composite aggregation .
In composite aggregtion you can only access data sequentially using after key. You won't be able to jump pages.

Find documents per category

IM a newbie to elasticsearch world.
I have done an aggregation and got the results. Now I need to see which documents are inside each category/buckets. How to do the same?
You can simply add the top_hits aggregation as a sub-aggregation of your terms aggregation, like this:
{
"aggs": {
"categories": {
"terms": {
"field": "category"
},
"aggs": { <--- add this sub-aggregation
"top_category_hits": {
"top_hits": {}
}
}
}
}
}

Elastic count by facets that exists only for some documents

I have a facet that exists only in some of the documents. I wish to know how many documents have each possible value of the facet, and how many doesn't have this facet at all.
The facet is color. My current query returns the count for different colors, but doesn't returns the count for documents without color:
"facets": {
"_Properties": {
"terms": {
"field": "Color",
"size": 100
}
}
}
Thanks!
Facets have been deprecated in Elasticsearch. You can use a combination of Terms Aggregation and Missing Aggregation for this. Find the query below for your requirement:
"aggs": {
"_Properties": {
"terms": {
"field": "Color",
"size": 100
}
},
"_MissingColor": {
"missing": {
"field": "Color"
}
}
}

Filter elasticsearch results to contain only unique documents based on one field value

All my documents have a uid field with an ID that links the document to a user. There are multiple documents with the same uid.
I want to perform a search over all the documents returning only the highest scoring document per unique uid.
The query selecting the relevant documents is a simple multi_match query.
You need a top_hits aggregation.
And for your specific case:
{
"query": {
"multi_match": {
...
}
},
"aggs": {
"top-uids": {
"terms": {
"field": "uid"
},
"aggs": {
"top_uids_hits": {
"top_hits": {
"sort": [
{
"_score": {
"order": "desc"
}
}
],
"size": 1
}
}
}
}
}
}
The query above does perform your multi_match query and aggregates the results based on uid. For each uid bucket it returns only one result, but after all the documents in the bucket were sorted based on _score in descendant order.
In ElasticSearch 5.3 they added support for field collapsing. You should be able to do something like:
GET /_search
{
"query": {
"multi_match" : {
"query": "this is a test",
"fields": [ "subject", "message", "uid" ]
}
},
"collapse" : {
"field" : "uid"
},
"size": 20,
"from": 100
}
The benefit of using field collapsing instead of a top hits aggregation is that you can use pagination with field collapsing.

Resources