Elasticsearch Composite aggregation with pagination - elasticsearch

I am using this query to fetch aggregated result but because the result matching the query criteria is very large the number of buckets is larger than 10000.
How should I write/modify this query so that I can paginate the result?
I have read that bucket aggregation doesn't allow pagination but it can be converted into composite aggregation to support pagination of response.
Any alternate way to paginate the response would also be helpful.
{
"query": {
"bool": {
"must": [
{
"range": {
"timestamp": {
"gte": "2022-04-26T00:00:00.000Z",
"lte": "2022-04-26T23:59:59.999Z"
}
}
},
{
"terms": {
"job.keyword": [
"JOB_1",
"JOB_2",
"JOB_3"
]
}
},
{
"bool": {
"should": [
{
"nested": {
"path": "tags",
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"term": {
"tags.name.keyword": "jobType"
}
},
{
"term": {
"tags.value.keyword": "discrete"
}
}
]
}
}
]
}
}
}
}
]
}
}
]
}
},
"size": 0,
"aggs": {
"job": {
"terms": {
"field": "job.keyword"
},
"aggs": {
"accountId": {
"terms": {
"field": "accountId.keyword",
"size": 10000
},
"aggs": {
"accountUsageStats": {
"stats": {
"field": "count"
}
},
"tags": {
"top_hits": {
"size": 1,
"_source": {
"include": [
"tags"
]
}
}
}
}
}
}
}
}
}

Related

ELASTICSERCH - Inner_hits aggregations

I am trying to do an aggregation of the {"wildcare": {"data.addresses.ces.cp": "maria*"},
{"macth": { "data.addresses.ces.direction": "rodriguez"}} fields, but it does not return the results of the query.
{ "_source": "created_at",
"size": 1,
"sort": [
{
"created_at.keyword": {
"order": "desc"
}
}
],
"query": {
"nested": {
"path": "data.addresses",
"inner_hits": {
},
"query": {
"nested": {
"path": "data.addresses.ces",
"query":
{"wildcare": {"data.addresses.ces.cp": "maria*"},
{"macth": { "data.addresses.ces.direction": "rodriguez"}}
}
}
}
}
}
How can I perform an aggregation that returns the values ​​of the query, and not all the values ​​of the JSON?
In case the aggregations don't support inner_hits, how could I get wildcare and macth in aggs?
You need to repeat the filter conditions in the aggregation part so that the aggregation only runs on the selected nested documents:
{
"_source": "created_at",
"size": 1,
"sort": [
{
"created_at.keyword": {
"order": "desc"
}
}
],
"query": {
"nested": {
"path": "data.addresses",
"inner_hits": {},
"query": {
"nested": {
"path": "data.addresses.ces",
"query": {
"bool": {
"filter": [
{
"wildcard": {
"data.addresses.ces.cp": "maria*"
}
},
{
"match": {
"data.addresses.ces.direction": "rodriguez"
}
}
]
}
}
}
}
}
},
"aggs": {
"addresses": {
"nested": {
"path": "data.addresses"
},
"aggs": {
"ces": {
"nested": {
"path": "data.addresses.ces"
},
"aggs": {
"query": {
"filter": {
"bool": {
"filter": [
{
"wildcard": {
"data.addresses.ces.cp": "maria*"
}
},
{
"match": {
"data.addresses.ces.direction": "rodriguez"
}
}
]
}
},
"aggs": {
"cp": {
"terms": {
"field": "data.addresses.ces.cp"
}
},
"direction": {
"terms": {
"field": "data.addresses.ces.direction"
}
}
}
}
}
}
}
}
}
}

How to nest bool queries?

I am building a search query which dynamically adds a set of constraints (bool) to the query. The general expected structure is as follows
OR (
AND (
condition
condition
...
)
AND (
condition
condition
...
)
)
In other words I have a set (one or more) of conditions which must all be met (AND above). There may be several of such sets, any of them should be enough for the final match (the OR above).
An example of such structure, as generated by my code (this is the full API query, the generated part is "bool"):
{
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"term": {
"attack_ip": "10.89.7.117"
}
},
{
"term": {
"sentinel_port": "17"
}
}
]
}
},
{
"bool": {
"must": [
{
"term": {
"attack_ip": "10.89.7.118"
}
}
]
}
}
]
},
"range": {
"eventtime": {
"gte": "2018-03-05T12:47:22.397+01:00"
}
}
},
"size": 0,
"aggs": {
"src": {
"terms": {
"field": "attack_ip",
"size": 1000
},
"aggs": {
"dst": {
"terms": {
"field": "sentinel_hostname_lan",
"size": 2000
}
}
}
}
}
}
My understanding of this query was:
if "attack_ip === 10.89.7.117" and "sentinel_port === 17"
or
if "attack_ip === 10.89.7.118"
the entry will match
Unfortunately I get upon calling Elasticsearch the error
"error": {
"root_cause": [
{
"type": "parsing_exception",
"reason": "[bool] malformed query, expected [END_OBJECT] but found [FIELD_NAME]",
"line": 1,
"col": 177
}
],
"type": "parsing_exception",
"reason": "[bool] malformed query, expected [END_OBJECT] but found [FIELD_NAME]",
"line": 1,
"col": 177
},
"status": 400
}
What does this error mean?
EDIT
Following Piotr's answer, I tried to move the range constraint into the boolean part. I get the same error, though.
My query is available online for easier reading and reproduced below:
{
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"bool": {
"must": [
{
"term": {
"attack_ip": "10.89.7.117"
}
},
{
"term": {
"sentinel_port": "17"
}
}
]
}
},
{
"bool": {
"must": [
{
"term": {
"attack_ip": "10.89.7.118"
}
}
]
}
}
]
}
},
{
"range": {
"eventtime": {
"gte": "2018-03-05T13:55:27.927+01:00"
}
}
}
]
},
"size": 0,
"aggs": {
"src": {
"terms": {
"field": "attack_ip",
"size": 1000
},
"aggs": {
"dst": {
"terms": {
"field": "sentinel_hostname_lan",
"size": 2000
}
}
}
}
}
}
}
I think the problem you have is with range part. Try to move it inside the bool:
{
"query": {
"bool": {
"should": [{
"bool": {
"must": [{
"term": {
"attack_ip": "10.89.7.117"
}
},
{
"term": {
"sentinel_port": "17"
}
}
]
}
},
{
"term": {
"attack_ip": "10.89.7.118"
}
}
],
"must": {
"range": {
"eventtime": {
"gte": "2018-03-05T12:47:22.397+01:00"
}
}
}
}
},
"size": 0,
"aggs": {
"src": {
"terms": {
"field": "attack_ip",
"size": 1000
},
"aggs": {
"dst": {
"terms": {
"field": "sentinel_hostname_lan",
"size": 2000
}
}
}
}
}
}
or move it to filter section:
{
"query": {
"bool": {
"should": [{
"bool": {
"must": [{
"term": {
"attack_ip": "10.89.7.117"
}
},
{
"term": {
"sentinel_port": "17"
}
}
]
}
},
{
"term": {
"attack_ip": "10.89.7.118"
}
}
],
"filter": {
"bool": {
"must": [{
"range": {
"eventtime": {
"gte": "2018-03-05T12:47:22.397+01:00"
}
}
}]
}
}
}
},
"size": 0,
"aggs": {
"src": {
"terms": {
"field": "attack_ip",
"size": 1000
},
"aggs": {
"dst": {
"terms": {
"field": "sentinel_hostname_lan",
"size": 2000
}
}
}
}
}
}
I hope I formatted this correctly. Please let me know if you have any issues.
In the end, it is possible that you will need to specify minimum_should_match param for bool query to get correct results.

How to filter subindex for aggregation in Elasticsearch?

I query an index with wildcard (interactive*) to get all documents for the two indices interactive-foo* & interactive-bar*.
For some of my aggregations all of the indices are relevant but for others only interactive-foo* OR interactive-bar*. So I just want to filter for these 'subindices' in the aggregation.
GET _search
{
"query":{
"bool": {
"must": [
{
"range": {
"timestamp": {
"gte": "2017-08-01 00:00:00",
"lte": "2017-08-31 23:59:59"
}
}
},
{
"match": {
"key": "SOME_KEY"
}
}
]
}
},
"size":0,
"aggs": {
// This one should be filtered and just count for interactive-bar*
"bar_count": {
"value_count": {
"field": "SOME_FIELD"
}
},
// This one should be filtered and just count for interactive-foo*
"foo_count": {
"value_count": {
"field": "SOME_FIELD"
}
}
}
}
You can use a filter aggregation like this:
{
"query": {
"bool": {
"must": [
{
"range": {
"timestamp": {
"gte": "2017-08-01 00:00:00",
"lte": "2017-08-31 23:59:59"
}
}
},
{
"match": {
"key": "SOME_KEY"
}
}
]
}
},
"size": 0,
"aggs": {
"bar_count": {
"filter": {
"indices": {
"indices": ["interactive-bar-*"]
}
},
"aggs": {
"bar_count": {
"value_count": {
"field": "SOME_FIELD"
}
}
}
},
"foo_count": {
"filter": {
"indices": {
"indices": ["interactive-foo-*"]
}
},
"aggs": {
"foo_count": {
"value_count": {
"field": "SOME_FIELD"
}
}
}
}
}
}
Note though that the indices query has been deprecated in ES 5.0. What you should do instead is to use a terms query on the _index field and list all the indices you want to include in your aggregation, like this:
"size": 0,
"aggs": {
"bar_count": {
"filter": {
"terms": {
"_index": ["interactive-foo-2017.08.14", "interactive-foo-2017.08.15"]
}
},
"aggs": {
"bar_count": {
"value_count": {
"field": "SOME_FIELD"
}
}
}
},
"foo_count": {
"filter": {
"terms": {
"_index": ["interactive-bar-2017.08.14", "interactive-bar-2017.08.15"]
}
},
"aggs": {
"foo_count": {
"value_count": {
"field": "SOME_FIELD"
}
}
}
}
}
}

How to use Aggregation by range time and terms

Idea: Search Top events on specific range and order by start_time. Like:
{
"from": 0,
"size": 7,
"query": {
"filtered": {
"query": { "match_all": {} },
"filter": {
"and": [
{ "bool": { "must_not": { "term": { "status": "OK" } } } },
{ "bool": { "must": { "term": { "is_blocked": false } } } }, {
"range": {
"start_time": {
"gte": "2016-01-01",
"lte": "2016-03-01"
}
}
}, {
"bool": {
"must": {
"geo_distance": {
"distance": "150km",
"coordinates": "xx.xxx, zz.zz "
}
}
}
}
]
}
}
},
"sort": [{ "start_time": "asc" },
{ "attending": "desc" }
]
}
I quite new on this concept of aggregations so still with basic problems to understand
I wanna 7 results of top events for the next 2 months. So I have two attributes to look. The max of people attending(attendings) is the definition of Top, but also I wanna order this by time(start_time: asc)
What I start to wrote but is wrong:
{
"aggs": {
"aggs": {
"event_interval": {
"date_histogram": {
"field": "start_time",
"interval": "2M",
"format": "dateOptionalTime"
}
},
"max_attending": { "max": { "field": "attending" } },
"_source": {
"include": [
"name"
]
}
}
}
}
I'm not sure you need to be using an aggregation to get what you are looking for, I think that a simple query can yield the results you would like to see, try this:
{
"size": 7,
"sort": {
"attending": {
"order": "desc"
}
},
"query": {
"bool": {
"filter": [
{
"range": {
"start_time": {
"gte": "now-2M",
"lte": "now"
}
}
}
]
}
}
}

Filter OUT matching documents in elasticsearch with aggregation

I'm attempting to query statistics about documents in elasticsearch with the following query. The problem is that I'm trying to ignore documents with certain values for the field logger, but I can't figure out how. The query below selects all the right documents into the set, but it doesn't exclude documents with the undesirable values.
Any suggestions very welcome.
{
"query": {
"bool": {
"filter": {
"bool": {
"must_not": {
"terms": {
"logger": [
"experimentsplitsegmentlogger_errors",
"ExperimentLogger"
]
}
}
}
},
"must_not": {
"terms": {
"logger": [
"experimentsplitsegmentlogger_errors",
"ExperimentLogger"
]
}
},
"must": {
"exists": {
"field": "count"
}
}
}
},
"aggs": {
"keys": {
"filter": {
"bool": {
"must_not": {
"terms": {
"logger": [
"experimentsplitsegmentlogger_errors",
"ExperimentLogger"
]
}
}
}
},
"terms": {
"field": "logger"
},
"aggs": {
"hostnames": {
"terms": {
"field": "hostname"
},
"aggs": {
"pids": {
"terms": {
"field": "pid"
},
"aggs": {
"time_stats": {
"stats": {
"field": "timestamp"
}
},
"count_stats": {
"stats": {
"field": "count"
}
}
}
}
}
}
}
}
},
"size": 0
}
This should work for you as I removed filter and terms from the same level of aggregation.
{
"query": {
"bool": {
"filter": {
"not": {
"terms": {
"logger": [
"experimentsplitsegmentlogger_errors",
"ExperimentLogger"
]
}
}
},
"must": {
"exists": {
"field": "count"
}
}
}
},
"aggs": {
"keys": {
"terms": {
"field": "logger"
}
}
},
"size": 0
}

Resources