How to scroll through aggregations - elasticsearch

I have the below query:
{
"aggs": {
"user-ids": {
"terms": {
"field": "userId",
"size": 10000
},
"aggs": {
"excluded_tags_agg": {
"filter": {
"bool": {
"must": [
{
"match_phrase": {
"tag": "Yes"
}
},
{
"match_phrase": {
"tag": "No"
}
}
],
"minimum_should_match": 1
}
}
},
"filter_userids_which_do_not_have_any_docs_with_excluded_tags": {
"bucket_selector": {
"buckets_path": {
"doc_count": "excluded_tags_agg > _count"
},
"script": "params.doc_count == 0"
}
}
}
}
},
"size": 0
}
But I may have more than 10k results so I need to scroll through the buckets. I have used Composite before but not sure how to combine it with the above.

Related

Elasticsearch aggregations query for multiselection

I have the following problem: In my Application, I have multiple multi-select-comboboxes for filtering search results.
the comboboxes show facets of the search results:
Depending of selections in a filter, the facet results decrease in the other filters. So far, so good. However, the results also degrease for the other possible selections in the combobox:
here, I would need the facets WITHOUT the already selected results. In this particulair field.
The query I use so far looks like that:
{
"size": 0,
"query": {
"bool": {
"must": [
{
"exists": {
"field": "depictionID"
}
},
{
"terms": {
"cave.caveTypeID": [
4
]
}
},
{
"terms": {
"cave.siteID": [
1
]
}
},
{
"terms": {
"cave.districtID": [
1
]
}
},
{
"terms": {
"cave.regionID": [
1
]
}
}
]
}
},
"aggs": {
"CaveType": {
"terms": {
"field": "cave.caveTypeID"
}
},
"Region": {
"terms": {
"field": "cave.regionID"
}
},
"Site": {
"terms": {
"field": "cave.siteID"
}
},
"District": {
"terms": {
"field": "cave.districtID"
}
}
}
}
I figured so far, that I need to put the selected fields out of the query and filter for them in the aggregation-section. However, I do not understand, how that could work, when two or more comboboxes have already selections.
Has anybody a good Idea, how to solve that problem?
Sincerely,
Erik
You need to use post_filter instead, like this:
{
"size": 0,
"post_filter": {
"bool": {
"must": [
{
"exists": {
"field": "depictionID"
}
},
{
"terms": {
"cave.caveTypeID": [
4
]
}
},
{
"terms": {
"cave.siteID": [
1
]
}
},
{
"terms": {
"cave.districtID": [
1
]
}
},
{
"terms": {
"cave.regionID": [
1
]
}
}
]
}
},
"aggs": {
"CaveType": {
"terms": {
"field": "cave.caveTypeID"
}
},
"Region": {
"terms": {
"field": "cave.regionID"
}
},
"Site": {
"terms": {
"field": "cave.siteID"
}
},
"District": {
"terms": {
"field": "cave.districtID"
}
}
}
}
Well, I did solve the problem by shifting the filters into the aggregation part, however, I had to make a aggregation for every single combobox as every combobox needs an aggregation WITHOUT its own filter, thus the aggregations grew dramaticly:
{
"aggs": {
"caveType": {
"filter": {
"terms": {
"cave.districtID": [
4
]
}
},
"aggs": {
"site": {
"filter": {
"terms": {
"cave.siteID": [
1
]
}
},
"aggs": {
"caveType": {
"terms": {
"size": 10000,
"field": "cave.caveTypeID"
}
}
}
}
}
},
"site": {
"filter": {
"terms": {
"cave.districtID": [
4
]
}
},
"aggs": {
"caveType": {
"filter": {
"terms": {
"cave.caveTypeID": [
4
]
}
},
"aggs": {
"site": {
"terms": {
"size": 10000,
"field": "cave.siteID"
}
}
}
}
}
},
"district": {
"filter": {
"terms": {
"cave.siteID": [
1
]
}
},
"aggs": {
"caveType": {
"filter": {
"terms": {
"cave.caveTypeID": [
4
]
}
},
"aggs": {
"district": {
"terms": {
"size": 10000,
"field": "cave.districtID"
}
}
}
}
}
},
"region": {
"filter": {
"terms": {
"cave.districtID": [
4
]
}
},
"aggs": {
"site": {
"filter": {
"terms": {
"cave.siteID": [
1
]
}
},
"aggs": {
"caveType": {
"filter": {
"terms": {
"cave.caveTypeID": [
4
]
}
},
"aggs": {
"region": {
"terms": {
"size": 10000,
"field": "cave.regionID"
}
}
}
}
}
}
}
}
},
"size": 0
}
If anyone has a more "elegant" way to do that, be my guest.

FIlter is not being applied to aggregation

I'm trying to get the billing of a product selled by a specific user, but it seems that the query is not being applied to the sum aggregation.
Could someone help me, please?
{
"query": {
"bool": {
"filter": [
{ "term": { "seller": 1 } },
{"term": { "product": 2 } }
]
}
},
"size": 0,
"aggs": {
"product": {
"terms": {
"field": "product"
},
"aggregations": {
"billing": {
"sum": {
"field": "price"
}
},
"aggregation": {
"bucket_sort": {
"sort": [
{
"billing": {
"order": "desc"
}
}
]
}
}
}
}
}
}
Try nesting your existing aggregations within another terms aggregation on "seller".
{
"query": {
"bool": {
"filter": [
{
"term": {
"seller": 1
}
},
{
"term": {
"product": 2
}
}
]
}
},
"size": 0,
"aggs": {
"seller": {
"terms": {
"field": "seller",
"size": 1
},
"aggs": {
"product": {
"terms": {
"field": "product",
"size": 1
},
"aggregations": {
"billing": {
"sum": {
"field": "price"
}
},
"aggregation": {
"bucket_sort": {
"sort": [
{
"billing": {
"order": "desc"
}
}
]
}
}
}
}
}
}
}
}

Add multiple filters to nested aggregation filters Elasticsearch

So I would like to add a couple more filters to the aggregate filter for the "inner" portion of the aggregate section. The other two filters I need to add are in the query section. I was able to get this code to work correctly, it just needs the second and third nested filters added from the first section down into the aggregate area, where I am only filtering by the "givingMatch.db_type" terms currently.
Here is the current code that just needs the additional filters added:
GET /testserver/_search
{
"query": {
"bool": {
"filter": [
{
"nested": {
"path": "givingMatch",
"query": {
"bool": {
"filter": {
"terms": {
"givingMatch.db_type": [
"FECmatch",
"StateMatch"
]
}
}
}
}
}
},
{
"nested": {
"path": "givingMatch",
"query": {
"bool": {
"filter": {
"range": {
"givingMatch.Status": {
"from": 0,
"to": 8
}
}
}
}
}
}
},
{
"nested": {
"path": "givingMatch",
"query": {
"bool": {
"filter": {
"range": {
"givingMatch.QualityScore": {
"from": 17
}
}
}
}
}
}
}
]
}
},
"aggs": {
"categories": {
"nested": {
"path": "givingMatch"
},
"aggs": {
"inner": {
"filter": {
"terms": {
"givingMatch.db_type":["FECmatch","StateMatch"]
}
},
"aggs":{
"org_category": {
"terms": {
"field": "givingMatch.org_category",
"size": 1000
},
"aggs": {
"total": {
"sum":{
"field": "givingMatch.low_gift"
}
}
}
}
}
}
}
}
},
"size": 0
}
Giving these results:
...."aggregations": {
"categories": {
"doc_count": 93084,
"inner": {
"doc_count": 65492,
"org_category": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "DEM",
"doc_count": 28829,
"total": {
"value": 29859163
}
},
{
"key": "REP",
"doc_count": 21561,
"total": {
"value": 69962305
}
},...
Hopefully this will save someone else a few hours. To add multiple filters, the aggregate section would become:
GET materielelectrique_search_alias/product/_search?explain=false
{
"aggs": {
"categories": {
"nested": {
"path": "givingMatch"
},
"aggs": {
"inner": {
"filter": {
"bool": {
"must": [
{
"terms": {
"givingMatch.db_type": [
"FECmatch",
"StateMatch"
]
}
},
{
"range": {
"givingMatch.QualityScore": {
"from": 17
}
}
},
{
"range": {
"givingMatch.Status": {
"from": 0,
"to": 8
}
}
}
]
}
},
"aggs": {
"org_category": {
"terms": {
"field": "givingMatch.org_category",
"size": 1000
},
"aggs": {
"total": {
"sum": {
"field": "givingMatch.low_gift"
}
}
}
}
}
}
}
}
}
}
This allows for multiple filters within the nested aggs.

Elasticsearch summing buckets

I have the following request which will return the count of all documents with a status of either "Accepted","Released" or closed.
{
"size": 0,
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "*",
"analyze_wildcard": true
}
}
],
"must_not": []
}
},
"aggs": {
"slices": {
"terms": {
"field": "status.raw",
"include": {
"pattern": "Accepted|Released|Closed"
}
}
}
}
}
In my case the response is:
"buckets": [
{
"key": "Closed",
"doc_count": 2216
},
{
"key": "Accepted",
"doc_count": 8
},
{
"key": "Released",
"doc_count": 6
}
]
Now I'd like to add all of them up into a single field.
I tried using pipeline aggregations and even tried the following sum_bucket (which apparently only works on multi-bucket):
"total":{
"sum_bucket":{
"buckets_path": "slices"
}
}
Anyone able to help me out with this?
With sum_bucket and your already existent aggregation:
"aggs": {
"slices": {
"terms": {
"field": "status.raw",
"include": {
"pattern": "Accepted|Released|Closed"
}
}
},
"sum_total": {
"sum_bucket": {
"buckets_path": "slices._count"
}
}
}
What I would do is to use the filters aggregation instead and define all the buckets you need, like this:
{
"size": 0,
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "*",
"analyze_wildcard": true
}
}
],
"must_not": []
}
},
"aggs": {
"slices": {
"filters": {
"filters": {
"accepted": {
"term": {
"status.raw": "Accepted"
}
},
"released": {
"term": {
"status.raw": "Released"
}
},
"closed": {
"term": {
"status.raw": "Closed"
}
},
"total": {
"terms": {
"status.raw": [
"Accepted",
"Released",
"Closed"
]
}
}
}
}
}
}
}
You could add count with value_count sub aggregation and then use sum_bucket pipeline aggregation
{
"aggs": {
"unique_status": {
"terms": {
"field": "status.raw",
"include": "Accepted|Released|Closed"
},
"aggs": {
"count": {
"value_count": {
"field": "status.raw"
}
}
}
},
"sum_status": {
"sum_bucket": {
"buckets_path": "unique_status>count"
}
}
},
"size": 0
}

Filter OUT matching documents in elasticsearch with aggregation

I'm attempting to query statistics about documents in elasticsearch with the following query. The problem is that I'm trying to ignore documents with certain values for the field logger, but I can't figure out how. The query below selects all the right documents into the set, but it doesn't exclude documents with the undesirable values.
Any suggestions very welcome.
{
"query": {
"bool": {
"filter": {
"bool": {
"must_not": {
"terms": {
"logger": [
"experimentsplitsegmentlogger_errors",
"ExperimentLogger"
]
}
}
}
},
"must_not": {
"terms": {
"logger": [
"experimentsplitsegmentlogger_errors",
"ExperimentLogger"
]
}
},
"must": {
"exists": {
"field": "count"
}
}
}
},
"aggs": {
"keys": {
"filter": {
"bool": {
"must_not": {
"terms": {
"logger": [
"experimentsplitsegmentlogger_errors",
"ExperimentLogger"
]
}
}
}
},
"terms": {
"field": "logger"
},
"aggs": {
"hostnames": {
"terms": {
"field": "hostname"
},
"aggs": {
"pids": {
"terms": {
"field": "pid"
},
"aggs": {
"time_stats": {
"stats": {
"field": "timestamp"
}
},
"count_stats": {
"stats": {
"field": "count"
}
}
}
}
}
}
}
}
},
"size": 0
}
This should work for you as I removed filter and terms from the same level of aggregation.
{
"query": {
"bool": {
"filter": {
"not": {
"terms": {
"logger": [
"experimentsplitsegmentlogger_errors",
"ExperimentLogger"
]
}
}
},
"must": {
"exists": {
"field": "count"
}
}
}
},
"aggs": {
"keys": {
"terms": {
"field": "logger"
}
}
},
"size": 0
}

Resources