Filter an elasticsearch result after an aggregation - elasticsearch

I have this elasticsearch query that get every x-locations for which the number of documents (with timestamp gte 1 month ago) is greater than 5000. I'm also able to get the most recent data timestamp for each of these x-locations.
Is it possible to add an additional filter at the end of the query, in order to ignore all x-locations for which the most recent timestamp is older than 2 days ago?
The query:
GET /mypattern-*/_search
{
"query": {
"bool": {
"must": [
{"match": {"method": "GET"}},
{
"range": {
"timestamp": {
"gte": "now-1M"
}
}
}
]
}
},
"aggs": {
"location_terms": {
"terms": {
"field": "x-location.keyword",
"min_doc_count": 500,
"size": 1000,
"order": {
"recent_timestamp": "desc"
}
},
"aggs": {
"recent_timestamp": {
"max": {
"field": "timestamp"
}
}
}
}
}
}

Related

Query returns result with small size that is not my intention in elasticsearch

I am using rest api to query the result from ElasticSearch.
Below is the API query string.
GET /..../_search
{
"size":0,
"query": {
"bool": {
"must": [
{ "range": {
"#timestamp": {
"time_zone": "+09:00",
"gte": "2023-01-24T00:00:00.000Z",
"lt": "2023-01-24T03:03:00.000Z" } } },
{
"term" : {
"serviceid.keyword" : {
"value" : "430011397"
}
}
}
]
}
},
"aggs": {
"by_day": {
"auto_date_histogram": {
"field": "#timestamp",
"minimum_interval":"minute"
},
"aggs": {
"agg-type": {
"terms": {
"field": "nxlogtype.keyword",
"size": 100000
},
"aggs": {
"my-sub-agg-name": {
"avg": {
"field": "size"
}
}
}
}
}
}
}
}
As you can see, I specified the time range about three hours in gte and lt field.
However, the result returns only 6 buckets which have 30 minute intervals.
I expected that many buckets will be returned with one minute interval during the timestamp I specified, but the result is always same even though I changed the time range as more extended one.
Since I am quite new to elastic search, I am not familiar with query usage.
How to resolve my issue?

Elasticsearch aggregation return null values as 0?

I can count the hits per day that match my queried string with this code, but if the span of a whole week has no hits, then the query will return nothing - as opposed to returning 0 for each day. Is there a way I can 'default' to 0?
GET index/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{"match_phrase": {
"message": "Cannot login"
}
},
{"range": {
"#timestamp":{
"gte":"2021-07-01",
"lte":"2021-07-07"
}
}
}
]
}
},
"aggs": {
"hit_count_per_day": {
"date_histogram": {
"field": "#timestamp",
"calendar_interval": "day"
}
}
}
}
For this propose you need to add extended_bounds to your aggregation like below:
GET index/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{"match_phrase": {
"message": "Cannot login"
}
},
{"range": {
"#timestamp":{
"gte":"2021-07-01",
"lte":"2021-07-07"
}
}
}
]
}
},
"aggs": {
"hit_count_per_day": {
"date_histogram": {
"field": "#timestamp",
"calendar_interval": "day",
"extended_bounds": {
"min": "2021-07-01",
"max": "2021-07-07"
}
}
}
}
}
Please let me know if it did not solve your problem.
When elasticsearch in aggregation response returns 0s means there are values not showed because of filters.
For example, I had a term filter on "region":"mexico" and it returned me:
"colombia": 0
"argentina": 0
"mexico": 7
Firsts two because of they are in dataset (index) but filtered.
Hopin' it could help.

Elasticsearch date histogram returning only maximum of 13 buckets

I'm trying to run the following query, and expecting 24 buckets to be present but 13 buckets are returned by elasticsearch.
{
"query": {
"bool": {
"must": [
{
"range": {
"timestamp": {
"gte": start_time,
"lte": start_time + 86400
}
}
},
{
"term": {
"some_field": "some_value"
}
}
]
}
},
"aggs": {
"hourly_data": {
"date_histogram": {
"field": "timestamp",
"fixed_interval": "60m",
"min_doc_count": 0
},
"aggs": {
"unique_some_agg_name": {
"cardinality": {
"field": "some_other_field"
}
}
}
}
},
"size": 0
}
I'm not able figure out how to set the bucket size of data_histogram.

Need aggregation of only the query results

I need to do an aggregation but only with the limited results I get form the query, but it is not working, it returns other results outside the size limit of the query. Here is the query I am doing
{
"size": 500,
"query": {
"bool": {
"must": [
{
"term": {
"tags.keyword": "possiblePurchase"
}
},
{
"term": {
"clientName": "Ci"
}
},
{
"range": {
"firstSeenDate": {
"gte": "now-30d"
}
}
}
],
"must_not": [
{
"term": {
"tags.keyword": "skipPurchase"
}
}
]
}
},
"sort": [
{
"firstSeenDate": {
"order": "desc"
}
}
],
"aggs": {
"byClient": {
"terms": {
"field": "clientName",
"size": 25
},
"aggs": {
"byTarget": {
"terms": {
"field": "targetName",
"size": 6
},
"aggs": {
"byId": {
"terms": {
"field": "id",
"size": 5
}
}
}
}
}
}
}
}
I need the aggregations to only consider the first 500 results of the query, sorted by the field I am requesting on the query. I am completely lost. Thanks for the help
Scope of the aggregation is the number of hits of your query, the size parameter is only used to specify the number of hits to fetch and display.
If you want to restrict the scope of the aggregation on the first n hits of a query, I would suggest the sampler aggregation in combination with your query

Elasticsearch term aggregation and range with timestamp

I'm trying to count # of logs grouped by user agent.
This is what I have.
GET /myindex/_search
{
"size": 30,
"stored_fields": ["req.headers.user-agent.keyword"],
"aggs": {
"group_by_userAgent": {
"terms": {
"field": "req.headers.user-agent.keyword"
}
}
}
}
I wanted to add "Query last 15 mins" feature. I've tried to add 'range' query and I ended up the following query, which does not work.
GET /myindex/_search
{
"size": 30,
"stored_fields": ["req.headers.user-agent.keyword"],
"aggs": {
"group_by_userAgent": {
"terms": {
"field": "req.headers.user-agent.keyword"
},
"range": {
"timestamp": {
"gt": "now-15m"
}
}
}
}
}
How do I query terms aggregation with range with "now-x15min" syntax?
The range should go inside the query section, not aggs. The time range is good as it is
I think what you're looking for is this, the number of docs in the first 30 user-agent buckets, i.e. the top 30 user agents producing the most logs
GET /myindex/_search
{
"size": 0,
"query": {
"range": {
"#timestamp": {
"gt": "now-15m"
}
}
},
"aggs": {
"group_by_userAgent": {
"terms": {
"field": "req.headers.user-agent.keyword",
"size": 30
}
}
}
}
you can do this in two ways to achieve aggregation results for user-agent.
POST phrase_index/_search
{
"aggs": {
"date_range_filtered_agg": {
"filter": {
"range": {
"timestamp": {
"gte": "now-15m/m"
}
}
},
"aggs": {
"group_by_userAgent": {
"terms": {
"field": "req.headers.user-agent.keyword",
"size": 10
}
}
}
}
},
"size": 30,
"stored_fields": ["req.headers.user-agent.keyword"]
}
POST phrase_index/_search
{
"query": {
"range": {
"timestamp": {
"gte": "now-15m/m"
}
}
},
"aggs": {
"group_by_userAgent": {
"terms": {
"field": "req.headers.user-agent.keyword",
"size": 10
}
}
},
"size": 30,
"stored_fields": ["req.headers.user-agent.keyword"]
}
You need a filter aggregation first to apply the range query, then add a terms sub-aggregation.
See: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-filter-aggregation.html

Resources