Is it possible to paginate term aggregation result with search term? - elasticsearch

Is it possible to use pagination in term aggregation query with a search term?
I need to paginate the result of the following query I am not able to find any solution ?
{
"sort": [{
"create_date": {
"order": "desc"
}
}],
"query": {
"bool": {
"must": []
}
},
"aggs": {
"genres": {
"terms": {
"field": "mentions.keyword",
"include": "insta.*"
}
}
}
}

you could use size and from to tell the engine to return the documents in that range every time you come back for next page. Have two variables in your service design and whoever calls the service should also pass the two variables values (basically documents from and the limit)
{
"from": from,
"size": limit,
"sort": [{
"create_date": {
"order": "desc"
}
}],
"query": {
"bool": {
"must": []
}
},
"aggs": {
"genres": {
"terms": {
"field": "mentions.keyword",
"include": "insta.*"
}
}
}
}
if you exposed this query through a service for example mysearch then call the service like this
mysearch?searchTerm=theWord&from=0&limit=15
and in the next call you do the same but with different from and limit values
mysearch?searchTerm=theWord&from=16&limit=15
if this information is not enough then post some sample documents to play with

If you are trying to fetch documents inside terms aggregation, you can use either of two options
In terms aggregation you can use partition to paginate data.
Refer document here
You can use composite aggregation .
In composite aggregtion you can only access data sequentially using after key. You won't be able to jump pages.

Related

How to paginate sorted data (with terms aggregation) using composite aggrgation?

How to write a pipeline aggregation to paginate the sorted data.[where sorting is done using terms aggregation based on its sub-aggregation]
GET index_name/_search
{
"query":{<some querying>}
"aggs": {
"pagination": {
"composite": {
"sources": [
{
"grouping": {
"terms": {
"field": "field_name.keyword",
"order": "desc"
}
}
}
]
},
"aggs": {
"results": {
"terms": {
"field": "field_name.keyword",
"order": {
"sub_aggregation": "desc"
}
},
"aggs": {
"sub_aggregation": {
"filter": {
"term": {
"field_name": "value"
}
}
}
}
}
}
}
}
}
The main problem is merging the following 2 sub-problems
P1. Sorting data based on selected key for which I used terms aggregation and inside the order of the same I have the sub-aggregation.
P2. I want to paginate the above-sorted data, I have used composite aggregation with terms aggregation.
When the composite aggregation is sub-aggregation to the terms aggregation. I get the following error:
[composite] aggregation cannot be used with a parent aggregation of type: [TermsAggregatorFactory]
and when I try the vice versa I get paginated data of terms data(P2) separately and sorting(P1) data in separate buckets.
How can I merge these two problems?

How do I filter after an aggregation?

I am trying to filter after a top hits aggregation to get if the first apparition of an error was in a given range but I can't find a way.
I have seen something about bucket selector but can't get it to work
POST log-*/_search/
{
"size": 100,
"aggs": {
"group":{
"terms": {
"field": "errorID.keyword",
"size": 100
},
"aggs": {
"group_docs": {
"top_hits": {
"size": 1,
"sort": [
{
"#timestamp": {
"order": "asc"
}
}
]
}
},
}
}
}
}
}
With this top hits I get the first apparition of a concrete errorID as I have many documents with the same errorID, but what I want to find is if the first apparition is within a given range of dates.
I think that a valid solution would be to filter the results of the aggregation to check if it is in the range, but I don't know how could I do that.

How to sort elasticsearch results based on number of collapsed items?

I'm using a a query with collapse in order to gather some documents under a certain person, yet I wish to sort the results based on the number of documents in which the search found a match.. this is my query:
GET documents/_search
{
"_source": {
"includes": [
"text"
]
},
"query": {
"query_string": {
"fields": [
"text"
],
"query": "some text"
}
},
"collapse": {
"field": "person_id",
"inner_hits": {
"name": "top_mathing_docs",
"_source": {
"includes": [
"doc_year",
"text"
]
}
}
}
}
Any suggestions?
Thanks
If I understand correctly, what you require here is to sort the documents i.e. parent documents, based on the count of inner_hits i.e. count of inner_hits based on person_id.
So that means, the _score of the parent documents in the result doesn't matter.
The only way I've found this doable is making use of the Top Hits Aggregation for Field Collapse Example and below is what your query would look like.
Aggregation Query Field Collapse Example:
POST <your_index_name>/_search
{
"size":0,
"query": {
"query_string": {
"fields": [
"text"
],
"query": "some text"
}
},
"aggs": {
"top_person_ids": {
"terms": {
"field": "person_id"
},
"aggs": {
"top_tags_hits": {
"top_hits": {
"size": 10
}
}
}
}
}
}
Note that I'm assuming person_id is of type keyword or any numeric.
Also if you look at query closely, I've mentioned "size":"0". Which means I'm only returning the result of aggregation.
Another note is that the above aggregation has nothing to do with Field Collapse in Search Request feature that you have posted in the question. It's just that using this aggregation, your result could be formatted in a similar way.
Let me know if this helps!

how to order on doc count for terms aggregation within a composite aggregation?

I was trying the composite aggregation in elastic-search but found it weird that what i can do within a terms aggregation normally, isn't supported for terms within a composite aggregation!
See the query below :
GET _search
{
"size": 0,
"query": {
"match_all": {}
},"aggs": {
"compo": {
"composite": {
"sources": [
{
"terms_inside": {
"terms": {
"field": "result_type",
"order": {
"_count": "asc" // not supported here!
}
}
}
}
]
}
},
"just_terms" :{
"terms": {
"field": "result_type",
"order": {
"_count": "asc" // supported here
}
}
}
}
}
Is the just the way it is, or is there a way to get sorted buckets on doc count with nested terms aggregation. I want to use paging and sorting on the terms aggregation.
It cannot be done as composite results paginate the aggregation and thus its function is designed to not fetch the count on all fields, only those in the first paginated set.
https://discuss.elastic.co/t/composite-aggregation-order-by/139563/5
You cannot aggregate on multiple terms and order on doc_count before elastic 7.12. On elasticsearch 7.12, you can use a multi terms aggregation.

Elastic Search - Find Data Common To Multiple Queries

In Elastic Search I have an index that contains users and the URLs that they've visited. I want to be able to search multiple users and find the common URLs that they've visited.
I can grab the URLs for a single user:
{
"size": 0,
"query": {
"filtered": {
"query": {
"query_string": {
"query": "user:bob"
}
},
"filter": {
"bool": {
"must": [{
"range": {
"#timestamp": {
"gte": 1430456930549,
"lte": 1430666630549
}
}
}],
"must_not": []
}
}
}
},
"aggs": {
"1": {
"terms": {
"field": "url",
"size": 0,
"order": {
"_count": "desc"
}
}
}
}
}
But how do I combine the results from each user (doing some sort of intersection). I can do this programmatically but can Elastic Search do this with some sort of aggregation?
You may use sub-aggregations, terms by urls inside terms by users:
{
"query": {
"match_all": {}
},
"aggs": {
"users": {
"terms": {
"field": "user"
},
"aggs": {
"urls": {
"terms": {
"field": "url"
}
}
}
}
}
}
This will give you buckets of users, each containing buckets of urls.
UPD I misunderstood your question at first. I'm not aware of such type of aggregation you're searching for. However you may take advantage of significant terms aggregation:
{
"query": {
"filtered": {
"filter": {
"terms": {
"user": ["alice", "jack"]
}
}
}
},
"aggs": {
"urls": {
"significant_terms": {
"field": "url",
"size": 5
}
}
}
}
This will give you buckets with the most popular urls within given set of users. Note that in any case it is not a strict intersection, but rather a list where top elements are urls that are more frequent in so-called foreground group (a query scope) than they are in the background group (all documents of the index).
Urls that are common for selected users are very likely to score high on this aggregation.
But if each of 2 requested users visit her own favourite site a lot more than other sites and doesn't visit the other user's favourite one at all, both urls will still appear, and will score higher than those in common.
Generally I recommend exploring this aggregation, it can give some interesting insights from data. For instance, more relevant usage of this aggregation in your dataset will be finding sites that are common between visitors of some other site.
You can read more about it here and here.

Resources